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Preface 



The 7th Annual European Symposium on Algorithms (ESA ’99) is held in Prague, 
Czech Republic, July 16-18, 1999. This continued the tradition of the meetings which 
were held in 

- 1993 Bad Honnef (Germany) 

- 1994 Utrecht (Netherlands) 

- 1995 Corfu (Greece) 

- 1996 Barcelona (Spain) 

- 1997 Graz (Austria) 

- 1998 Venice (Italy) 

(The proceedings of previous ESA meetings were published as Springer LNCS vol- 
umes 726, 855, 979, 1136, 1284, 1461.) 

In the short time of its history ESA (like its sister meeting SODA) has become a 
popular and respected meeting. 

The call for papers stated that the “Symposium covers research in the use, design, 
and analysis of efficient algorithms and data structures as it is carried out in com- 
puter science, discrete applied mathematics and mathematical programming. Papers 
are solicited describing original results in all areas of algorithmic research, including 
but not limited to: Approximation Algorithms; Combinatorial Optimization; Computa- 
tional Biology; Computational Geometry; Databases and Information Retrieval; Graph 
and Network Algorithms; Machine Learning; Number Theory and Computer Algebra; 
On-line Algorithms; Pattern Matching and Data Compression; Symbolic Computation. 
The algorithms may be sequential, distributed or parallel, and they should be analyzed 
either mathematically or by rigorous computational experiments. Submissions that re- 
port on experimental and applied research are especially encouraged.” 

The total of 1 22 papers were received. The program committee thoroughly reviewed 
the papers (each paper was sent to at least 3 members of the PC) and after an electronic 
discussion the agreement was reached during the PC meeting in Prague, March 26- 
28, 1999. (At this meeting the following members of PC were physically present: G. 
Billardi, H. Bodlaender, J. Diaz, A. Goldberg, M. Goemans, M. Kaufmann, B. Monien, 
J. Matousek, E. Mayr, J. Nesetfil, P. Widmayer, G. Woeginger, while most of the other 
PC members cooperated electronically.) 

The program committee selected 44 papers, which are presented in this volume. 

The program of ESA’ 99 includes two invited talks by Bernhard Korte (University 
of Bonn) and Moti Yung (Columbia, New York). 

As an experiment, ESA’99 was held at the same time as ICALP’99. Only history 
will show whether this scheme is suitable to the European context. 

The ESA’99 was organized by the Centre of Discrete Mathematics, Theoretical 
Computer Science and Applications (shortly DIMATIA) which is a joint venture of 
Charles University, the Czech Academy of Science and the Institute for Chemical Tech- 
nology (all based in Prague) together with its associates (presently in Barcelona, Biele- 
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feld, Bonn, Bordeaux, Budapest, Novosibirsk, Pilsen, and Pisa). Also two similar cen- 
ters DIMACS (Rutgers University, New Jersey) and PIMS (Vancouver, Canada) are 
associated members of DIMATIA. We thank all our partners for the support and all the 
work. We would like to thank all the members of the organizing committee, and par- 
ticularly Mrs. Hana Casenska (DIMATIA) and Mrs. Anna Kotesovcova (CONFORG). 
The electronic efficiency was made possible by Jifi Fiala, Jifi Sgall, Vit Novak, and 
Pavel Valtr. 

We also thank our industrial supporters Cedok, Telekom, Komercni banka, Mer- 
cedes Benz Bohemia, and Conforg (all based in Prague). DIMATIA is also supported 
by GACR grant 201/99/0242 and MSMT grant 055 (Kontakt). 

We hope that the present volume reflects the manifold spectrum of contemporary 
algorithmic research and that it convincingly demonstrates that this area is alive and 
well. We wish the next ESA (to be held in Saarbruecken) success and many excellent 
contributions. 



July 1999 



Jaroslav Nesetfil 
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Abstract. When attacking a distributed protocol, an adaptive adversary is able 
to decide its actions (e.g., which parties to corrupt) at any time based on its entire 
view of the protocol including the entire communication history. Proving security 
of cryptographic protocols against adaptive adversaries is a fundamental prob- 
lem in cryptography. In this paper we consider “distributed public-key systems” 
which are secure against an adaptive adversary. 



1 Introduction 

Distributed public -key systems involve public/seeret key pairs where the secret key is 
distributively held by some number of the servers (using a secret sharing scheme). In 
these systems, a key is split amongst a set of share-holders and a quorum of servers is 
needed to act on a common input in order to produce a function value (a signature or 
a cleartext). As long as an adversary does not eorrupt a certain threshold of servers the 
system remains secure (as opposed to centralized cryptosystems in which the compro- 
mise of a single entity breaks the system). Funetion sharing (Threshold) systems were 
presented in [16, 17, 15]. Robust function sharing systems, in which the function can be 
evaluated correctly even if the adversary causes share-holders it eontrols to misbehave 
arbitrarily, were presented in [29, 25, 30]. Construetions of these systems are required to 
be efficient (e.g., they should not involve generic “secure function evaluation” which is 
assumed impractical [15]). The current trend for specific efficient solutions is reviewed 
in [31, 28]. 

A fundamental problem in eryptography is coping with an adaptive adversary who 
may, while a protocol is running, attack the protocol using actions based on its complete 
view up to that point in the protocol. This problem was dealt with reeently in the context 
of multi-party computations (secure function evaluation), initially where parties erase 
some of their information [4], and later even when they do not necessarily do so [6]. An 
adaptively secure Oblivious Transfer protocol was also given in [3]. In this paper we 
examine the problem of obtaining adaptive seeurity for distributed public-key systems. 
None of the known implementations of distributed cryptosystems have been proven 
secure under the very powerful “adaptive” adversary model. We deal with both discrete- 
log-based (DL-based) [18] and RSA-based [44] systems. 

The major difficulty in proving the security of protocols against adaptive adversaries 
is being able to efficiently simulate (without actually knowing the secret keys) the view 
of an adversary which may corrupt parties dynamically, depending on its internal “un- 
known strategy.” The adversary’s corruption strategy may be based on values of public 

J. Nesetfil (Ed.): ESA’99, LNCS 1643, pp. 4-27, 1999. 
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ciphertexts, other public cryptographic values in the protocol, and the internal states of 
previously corrupted parties. For example, the adversary could decide to corrupt next 
the party whose identity matches a one-way hash function applied to the entire com- 
munication history of the protocol so far. Since in an “actual execution,” an adaptive 
adversary does obtain such a view, this simulation argument is intuitively necessary to 
claim that the adversary obtains no useful information, and thus claim that the protocol 
is secure. Now, when a party is corrupted, its (simulated) internal state must be consis- 
tent with the current view of the adversary and publicly available values, which include 
public ciphertexts and other public cryptographic values (e.g., commitments). Without 
the secret keys, however, the simulator is not able to determine the true internal states 
of all parties simultaneously, and thus might have difficulty producing a consistent in- 
ternal state for an arbitrarily corrupted party. In other words, we may fail to simulate the 
“on-line” corruption and will need to backtrack and try again to produce a consistent 
view for the adversary. But after backtracking and proceeding with different ciphertexts 
or cryptographic values, the adaptive adversary may corrupt different parties (based on 
its unknown strategy). Since the adversary can corrupt subsets of users it has an expo- 
nentially large set of corruption possibilities, and since it applies an unknown strategy, 
we may not be able to terminate the simulation in expected polynomial time, which is 
a requirement for claiming security. 

In distributed public-key systems, the problem of adaptive security is exacerbated by 
the fact that there is generally “public function and related publicly-committed robust- 
ness information” available to anyone, which as discussed above, needs to be consistent 
with internal states of parties which get corrupted. This is the main cause of difficulties 
in the proof of security. 

Our Contributions and Techniques: We give a new set of techniques that can be 
used to construct distributed DL-based and RSA-based public-key systems with adap- 
tive security. Since the simulation-based proofs of the earlier techniques fail against an 
adaptive adversary, we have to employ new ideas. The driving “meta idea” is to develop 
techniques that assure, in spite of the “exponential set of behaviors” of the adversary, 
that the adversary can only “disrupf ’ the simulation with polynomial probability. This 
argument will assure simulatability and thus a “proof of security”. The basic principle 
is based on the notion of a “faking server.” The simulator exploits the “actions” of this 
server to assure that the view is simulatable while not knowing the secret key. This 
server is chosen at random and its public actions are indistinguishable from an hon- 
est server to the adversary. We have to backtrack the simulation only if the adversary 
corrupts this special server. Since there is only one faking server, and since regardless 
of its corruption strategy, the adversary has a polynomial chance (at least 1 /(t + 1)) of 
not corrupting this one server, we will be able to complete the simulation in expected 
polynomial time. 

We employ non-binding encryption and develop the notion of “detached commit- 
ments”. These commitments are used to ensure correct behavior of servers, yet have 
no “hard attachment” to the rest of the system, even the secret key itself! We show how 
to work with these detached commitments, e.g., using “function representation trans- 
formations” like “poly-to-sum” and “sum-to-poly” (which we build based on [23]). We 
also show how to maintain robustness by constructing simulatable “soft attachments” 
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from these detached commitments into the operations of the rest of the system. By us- 
ing detaehed commitments, the simulator has the freedom to separate its simulation 
of the seeret key representation (which, as it turns out, doesn’t even depend on the 
secret key) and its simulation of secret-key-based funetion applications (which natu- 
rally depends on the secret key), thus enabling a proof of security. The soft attachments 
are constructed using efficient zero-knowledge proofs-of-knowledge. The protocols we 
give for these are similar to [10], but are novel in the following ways: (1) there is one 
“setup” protocol for all the “proof’ protocols, which allows concurrency in the “proof’ 
protocols, and (2) the setup and proof protocols are not as tightly related (i.e. the setup 
does not prove knowledge of a commitment based on exactly what the proof protocol 
is trying to prove) but still achieve statistieal zero-knowledge (no reliance on computa- 
tional assumptions). We believe these ZK-proof teehniques will be useful in developing 
future threshold cryptosystems. 

Our teehniques maintain “optimal resilienee,” namely, the protoeols can withstand 
any minority of misbehaving parties {t faults out of I parties while / > 2t + 1 is allowed). 
Our main results are: 

Theorem 1. There exists an adaptively-secure robust DL-based optimal-resilient {t, /)- 
threshold public-key system. 

Theorem 2. There exists an adaptively-secure robust RSA-based optimal-resilient {t,l)- 
threshold public-key system. 

Beyond robustness of distributed cryptosystem, there is the notion of proactive security, 
which is a strengthening of the security of a system to cope with mobile adversaries 
[39]. In proactively- secure systems, the share-holders must maintain the key and re- 
randomize it periodically. Thus, an adversary which corrupts less than a threshold in 
every period cannot break the system (even if over time every share-holder is corrupted). 
Proactive public-key systems were presented in [33, 24, 7, 23, 43] Furthermore, to ini- 
tiate the above systems without a trusted key generator or “dealer” (whose presence 
provides a single source of failure) requires a “distributed key generation” procedure. 
Such protocols were given in [40, 5, 26]. The teehniques in this paper for the DL-based 
systems ean be used to construct a proactive DL-based system and also extend to key 
generation. We present protoeols for these, but omit the proofs due to space considera- 
tions. 

Corollary 3. There exists an adaptively-secure proactive DL-based optimal-resilient 
{t, l)-threshold public-key system. 

Corollary 4. There exists an adaptively-secure DL-based optimal-resilient {t,l)-thres- 
hold key-generation system. 

In a companion paper (also [27]) we extend this work to key generation and proactive- 
maintenance of RSA-based systems, which require additional teehniques. 

2 Model and Definitions 

Our system consists of / servers S = A server is corrupted if it is eon- 

trolled by the adversary. When a server is corrupted, we assume “for security” that the 
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adversary sees all the information currently on that server. On the other hand, the sys- 
tem should not “open” secrets of unavailable servers. Namely, we separate availability 
faults from security faults (and do not cause security exposures due to unavailability). 
We assume that all un-corrupted servers receive all messages that are broadcast, and 
may retrieve the information from messages encrypted with a public key, if they know 
the corresponding private key. Our communication model is similar to [34]. All partic- 
ipants communicate via an authenticated bulletin board [8] in a synchronized manner. 

The adversary: Our threshold schemes assume stationary adversary which stays at 
the corrupt processor (extensions to mobile adversary as defined in [15] are assumed 
by the proactive protocol). It is i-restricted; namely it can, during the life-time of the 
system, corrupt at most t servers. The actions of an adversary at any time may include 
submitting messages to the system to be signed, corrupting servers, and broadcasting ar- 
bitrary information on the communication channel. The adversary is adaptive; namely 
it is allowed to base its actions not only on previous function outputs, but on all the 
information that it has previously obtained during the execution of the protocol. 

DL-based and RSA-based systems: For information on the basics of DL-based and 
RSA-based systems, see Appendix A. It includes, among other information, details of 
the variants of secret sharing schemes that are used, such as Shamir threshold secret 
sharing and Pedersen unconditionally-secure threshold verifiable secret sharing ((t,/)- 
US-VSS) over known groups, along with secret sharing and unconditionally-secure 
threshold verifiable secret sharing (INT-(i,/)-US-VSS) over the integers, with check 
shares computed modulo an RSA modulus. 

Distributed Publie-Key Systems: We will say that the secret key x is shared among 
the servers, and each server Si holds share v,. The public key associated with v will be 
called y. We say a {t ,1) -threshold system is a system with I servers that is designed to 
withstand a i-restricted adaptive adversary. Formal definitions for distributed public -key 
systems are given in Appendix B. 



3 Techniques 

The main problem with proving security and robustness against adaptive adversaries is 
that public values such as ciphertexts and commitments are linked to actual cleartext 
values in an undeniable fashion. To “detach” ciphertexts from their cleartext values we 
simply employ semantically-secure non-committing encryption [4]. In fact, our (full) 
security proofs first assume perfectly secret channels and then add the above (a step we 
omit here). 

A more involved issue concerns the commitments. We know that the collection of 
techniques needed to underly distributed public-key systems include: distributed repre- 
sentation methods (polynomial sharing, sum (additive) sharing), representation trans- 
formers which move between different ways to represent a function (poly-to-sum, sum- 
to-poly) as well as a set of “elementary” distributed operations (add, multiply, invert). 
For example, the “poly-to-sum” protocol is executed by i + 1 servers at a time, and 
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transforms t + 1-out-of-/ polynomial-based sharings to t + l-out-of-t+ 1 additive shar- 
ings. We need to have such techniques (motivated by [26, 23]) which are secure and 
robust against adaptive adversaries. We will rely on new zero-knowledge proof tech- 
niques (see Appendix C), as well as on shared representation of secrets as explained 
in Sections A.l and A. 2. The notation “2poly” refers to a polynomial and its com- 
panion polynomial shared with (t, /)-US-VSS (which is “unconditionally secure VSS”) 
(or INT-(t,/)-US-VSS (which is the same but over the Integers)). The notation “2sum” 
refers to two additive sharings, with check shares that contain both additive shares of a 
server (similar to the check shares in (t,/)-US-VSS). In describing the DL-based pro- 
tocols, unless otherwise noted we will assume multiplication is performed mod p and 
addition (of exponents) is performed mod q. In describing the RSA-based protocols, 
unless otherwise noted we will assume multiplication is performed mod N and addition 
(of exponents) is performed over the integers (i.e., not “mod” anything). 

3.1 2poly-to-2sum 

The goal of 2poly-to-2sum is to transform t-degree polynomials a() and a'{) used in 
(t, /)-US-VSS into t + 1 additive shares for each secret o(0) and a'(0), with correspond- 
ing check shares. The idea is to perform interpolation. * The DL-based scheme shown 
in Figure 1 does not actually require any communication, since all check shares can be 
computed from public information. We note that in Step 2 each Si and s\ is a multiple 
of L, so Si can actually compute b, and over the integers. The RSA-based scheme 
shown in Figure 2 is similar, but requires Si to broadcast the check share, since it cannot 
be computed by every server. 



1. Initial configuration: (t,/)-US-VSS (parameters: (p, A)) with t-degree polynomials 

fl() and o' {), and a set A of t + 1 server indices. For all i e A, recall Si holds shares Sj and 
s'- with corresponding check share d,- = 

2. Si computes additive shares 6; = SjZi f^ and b'^ = 

3. Every server computes the check shares 5; = for all i e A. (Note that there is 

no communication, since the additive shares can be computed individually by each share- 
holder, and all check shares can be computed from publicly available verification shares.) 



Fig. 1. 2poly-to-2sum: DL-based scheme 



3.2 2sum-to-2sum 

The goal of 2sum-to-2sum is to randomize additive dual-shares (most likely obtained 
from a 2poly-to-2sum) and update the corresponding check shares. The DL-based scheme 
is in Figure 3 and the RSA-based scheme is in Figure 4. 

* In [23], poly-to-sum also performed a rerandomization of the additive shares. We split that 
into a separate protocol for efficiency, since sometimes 2poly-to-2sum is used without a reran- 
domization of additive shares. 
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1. Initial configuration: INT-(t,/)-US-VSS (parameters: (A,g-, A)) with t-degree polynomi- 
als fl() and o' {), and a set A of t + 1 server indices. For all i e A, recall Si holds shares Si 
and s'- with corresponding check share A,- = 

2. For all i e A, Si computes the additive shares bi = SiZij^ and b[ = s'fij^ and publishes 
Bi = g^-h‘’^=4\ 

3. All servers verify Bi for all i e A using {Ai)^‘-^ = {Bi)^‘-^ where = ri/eA\{i}(® “2) 

and = ny(EA\{!} j)- If Ihs verification for a given 5,- fails, each server broadcasts a 

(Bad,0 message and quits the protocol. 



Fig. 2. 2poly-to-2sum: RSA-based scheme 



1 . Initial configuration: There is a set A of t + 1 server indices. For all i e A, Si holds additive 
dual-share (6,-, b'^), with corresponding check share 5; = ^‘h^‘ . 

2. For all i e A, Si chooses rij e Zq and r- ^ e Zq for j e A\ {i}. 

3. For all i e A, Si sets r,-,,- = bi - Z;gA\{i} and rj,. = b[- Z/£A\{i} ''!/• 

4. For all i e A, Si privately transmits Vij and r- j to all Sj for j e A\ {i}. 

5. For all i e A, Si publishes Rij = for j e A\ {i}. 

6. All servers can compute i?,;,- = 5!/n/£A\{!}I^i./ fo^ ah i € A. 

7. For all j e A, 5/ verifies Rij = if the verification fails, Sj broadcasts an 

(Accuse,i,7?,y,) message, to which Si responds by broadcasting Vi j and r-^.. If Si does not 

respond, or Rij ^ (which all servers can now test), then each server broadcasts a 

(Bad,;) message and quits the protocol. 

8. For all j e A, Sj computes dj = XigA n,], d'j = Z;eA''f/> and = HieAl^i.f- 



Fig. 3. 2sum-to-2sum: DL-based scheme 



3.3 2sum-to-lsum 

The goal of2sum-to-lsum is to reveal check shares corresponding to the first half of ad- 
ditive dual-shares, and prove they are correct. These proofs form the “soft attachments” 
from the information-theoretically secure verification shares to the computationally se- 
cure check shares that must correspond to the actual secret. The DL-based scheme is 
shown in Figure 5 and the RSA-based scheme is shown in Figure 6. 

4 Protocols 

We now present protocols for threshold cryptographic tunction application for both 
DL-based and RSA-based systems. The security and robustness of these protoeols are 
proven in Appendix E. 

4.1 DL-based threshold function application 

Flere we consider any DL-based {t , /) -threshold function application protocol that works 
by (1) constructing a verifiable additive representation of the secret x over ? + 1 servers 
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1 . Initial configuration: There is a set A of t + 1 server indices. For all i e A, Si holds additive 
dual-share (6,-, b'^), with corresponding cheek share 5; = ^‘h^‘ . 

1. For all i e A, Si chooses rij e Zff and r[ ■ e Zv, for j e A\ {i}. 

3. For all i e A, Si sets n^i = bi - Z/6A\{i} rij and rj,- = b\- Z/eA\{i} r-j- 

4. For all i e A, Si privately transmits rij and r-y to all Sj for j e A\ {i}. 

5. For all i e A, Si publishes Rij = for j e A\ {i}. 

6. All servers can compute ^ 

7. For all j e A, verifies that each r,- y and r- y received is in the correct range and that Ri j = 

If the verification fails, Sj broadcasts an (Accuse, message, to which Si 
responds by broadcasting rij and r- y. If Si does not respond, rij or r- y is not in the correct 

range, or Ri j ^ (which all servers can now test), then each server broadcasts a 

(Bad,i) message and quits the protocol. 

8. For all j e A, Sj computes dj = XigA n,], d'- = Z;gA''J,y> and £)y = HigA^ij- 



Fig. 4. 2sum-to-2sum: RSA-based scheme 



1. Initial configuration: Parameters {p,q,g,h). There is a set A of t + 1 server indices. 

For all i e A, Si holds additive dual-share {di,d[), with corresponding check share 
Di = . Also, all servers Si with i e A have performed a ZK-proof-setup protocol 

ZKsETUP-DL(p,(y,g-, A) with all other servers. 

2. For all i e A, Si broadcasts A; = g^‘ . 

3. For all i e A, Si performs a ZK-proof of knowledge ZKPROOF-DL-REP(/i,(y,g-, AjA,-, A) 
with all other servers. Recall that this is performed over a broadcast channel so all servers 
can check if the ZK-proof was performed correctly. 

4. If a server detects that for some i e A, Si fails to perform the ZK-proof correctly, that server 
broadcasts a message (Bad,i) and quits the protocol. 



Fig. 5. 2sum-to-lsum: DL-based scheme 



with check shares over g, (2) finishing the function application with those i + 1 servers 
(we will call this the additive application step), if there is no misbehavior, and (3) going 
back to step (1) if misbehavior is detected, discarding servers which have misbehaved 
and using t + 1 remaining servers. We assume that there is a simulator for the additive 
application step for a message m which can simulate the step with inputs consisting of 
t + 1 additive shares with t + 1 check shares, where at most one additive share does 
not correspond to its check share, and a signature on m. The simulator fails only if the 
“faking server” (the one containing the unmatched share and check share) is corrupted, 
and otherwise provides a view to the adversary which is perfectly indistinguishable 
from the view the adversary would have in the real protocol. Most robust threshold 
DL-based protocols against static adversaries, like AMV-Harn [1, 32] and El-Gamal 
decryption [21] work this way. For an example with AMV-Ham signatures, see [36]. 
We show how to use this technique for static adversaries to construct a protocol that 
withstands an adaptive adversary. Specifically, we use a (t,/)-US-VSS representation 
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1. Initial configuration: Parameters {N,e,g,h). There is a set A of t + 1 server indices. 

For all i e A, Si holds additive dual-share {di,d[), with corresponding check share 
Di = Also, all servers Si with i e A have performed a ZK-proof-setup protocol 

ZKsETUP-RSA(A,e,g-) with all other servers. 

2. For all i e A, Si broadcasts £,■ = m‘^‘, where m is the message to be signed, or more gener- 
ally, the value to which the cryptographic function is being applied. 

3. For all i 6 A, Si performs a ZK-proof of knowledge 
ZKPROOF-IF-REP(A,e,m,g-,/!,£,-,D,) with all other servers. Recall that this is 
performed over a broadcast channel so all servers can check if the ZK-proof was 
performed correctly. 

4. If a server detects that for some i e A, Si fails to perform the ZK-proof correctly, that server 
broadcasts a message (Bad,i) and quits the protocol. 



Fig. 6. 2sum-to-lsum: RSA-based scheme 



to store the secret, and for function application we use 2poly-to- 1 sum (shorthand for 
the concatenation of 2poly-to-2sum, 2sum-to-2sum, and 2sum-to-lsum) to construct 
the verifiahle additive representation of the secret. We call this the Basic DL-based 
Threshold Protocol. The protocol is given in Figure 7. 



1. Initial configuration: DL-based system parameters: {p,q,g). 

2. The dealer generates h Er Z*, x,x' Er Zq,y = g*, and a (t,/)-US-VSS on secrets x,x'. 

3. Each (ordered) pair of servers [Si,Sj) performs Z1L^EJVV-T)hs.s.[p,q,g,h). 

4. Each server maintains a list (? of server indices for servers that have not misbehaved (i.e., 
they are considered good). 

5. When a message m needs to be signed, the following DistApply protocol is run: 

(a) A set A C G with | A| = t + 1 is chosen in some public way. 

(b) 2poly-to-2sum is run. If there are misbehaving servers, their indices are removed from 
(7 and the protocol loops to Step 5a. 

(c) 2sum-to-2sum is mn. If there are misbehaving servers, their indices are removed from 
(7 and the protocol loops to Step 5a. 

(d) 2sum-to-lsum is run. If there are misbehaving servers, their indices are removed from 
(7 and the protocol loops to Step 5a. 

(e) The additive application step of the signature protocol is run. If there are misbehaving 
servers, their indices are removed from (? and the protocol loops to Step 5a. 

(f) All values created during the signing protocol for m are erased. 



Fig. 7. Basic DL-based Threshold Protocol 



4.2 RSA-based threshold function application 

We define Basic RSA-based protocols analogously to Basic DL-based protocols. The 
main change is that the version of VSS over the integers (INT- (i, /)-US-VSS) should be 
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used. (An example of this type of protocol for RSA signature and decryption functions 
is given in [23].) This allows the simulator to construct a view for the adversary which 
is statistically indistinguishable (as opposed to perfectly indistinguishable in the DL- 
based protocols) from the view the adversary would have in the real protocol. We also 
shortcut the additive application step, and simply form the partial RSA signatures in the 
2sum-to-lsum step. The protocol is given in Figure 8 



1. The dealer generates an RSA public/private key {N,e,d), and computes public value x* 
and secret value x such that d = x* +L^x mod <|)(A^), as in [24],“ Then the dealer generates 
generators g,h <Er Z’^,x' <Er Z^, and an INT-(t,/)-US-VSS (withii^ = N, i.e., the range of 
the secret is assumed to be [0,A]) on secrets x,x' with parameters {N,g,h). 

2. Each (ordered) pair of servers {Si,Sj) performs ZKsETUP-RSA 5 , 5 ^,(A,e,g-,/z). 

3. Each server maintains a list of server indices for servers that have not misbehaved (i.e., 
they are considered good). 

4. When a message m needs to be signed, the following DistApply protocol is run: 

(a) A set A C G with | A| = t + 1 is chosen in some public way. 

(b) 2poly-to-2sum is mn. If there are misbehaving servers, their indices are removed from 
§ and the protocol loops to Step 4a. 

(c) 2sum-to-2sum is mn. If there are misbehaving servers, their indices are removed from 
§ and the protocol loops to Step 4a. 

(d) 2sum-to-lsum is run. If there are misbehaving servers, their indices are removed from 

and the protocol loops to Step 4a. If there is no misbehavior, the signature on m can 
be computed from the partial signatures generated in this step. 

(e) All values created during the signing protocol for m are erased. 

“ Recall that x* is computed using only the public values N,e,L. 



Fig. 8. Basic RSA-based Threshold Protocol 



4.3 2sum-to-2poly 

The protocol is given in Figure 9 

4.4 DL-based Key Generation 

In Figure 10 we give the protocol for DL-based Key Generation. The major issues are: 
(1) generating h (the element with unknown logarithm) in a distributed way, so as not 
to rely on a centralized trusted entity to generate h without knowing DL{h,g), yet (2) 
allowing a reduction in the robustness proof from finding DL{h' ,g) (for some random h' 
in a discrete log problem instance) to finding DL{h,g). Finally (3) the participants share 
their contribution to the public/private key pair (y,x) prior to learning any (information- 
theoretic) knowledge, so as to avoid “restarts” which may introduce biases regarding 
the generated key. 
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1. Initial configuration: Parameters {p,q,g,h). There is a set A of server indices. For all 
i e A, Si holds additive dual-share (c/,-, c/-), with corresponding check share D,- = ^‘h^‘ . 

2. For each i e A, Si shares di andc/- using (t,/)-US-VSS, say with polynomials v, () and V;(). 

3. Sj computes the sums v[j) = Y.iek'^iU) and v'(y) = Y.ieK'^iU)- The verification shares for 
v() and v'O can be computed from the verification shares for v,() and d-Q, for i e A. 

4. If a verification fails for the (t, /)-US-VSS from Si, each server broadcasts {Bad, i). When 
2sum-to-2poly is used for key generation, this server is simply removed from A, and the 
protocol proceeds on the smaller set A. When 2sum-to-2poly is used for proactive mainte- 
nance, the servers quit the protocol. 



Fig. 9. 2sum-to-2poly: DL-based scheme 



1. Initial configuration: DL-based system parameters: {p,q,g). 

2. Generate shares of h\ Each server Si generates a random r,- e Zq and computes hi = g^‘. 
Then each server Si broadcasts hi. 

3. Each server Sj tests hj = 1 mod p, with I the set of indices in which this test passes. 

4. Each (ordered) pair of servers [Si,Sj) from I xl performs ZKsetup-DL^,^^^, (/),(?, g, Ay). 
(This is used as the ZKsetup-DL protocol for all later ZKproof-DL and 
ZKproof-DL-REP protocols.) 

5. Each (ordered) pair of servers {Si,Sj) from I x I performs 
ZKproof-DL5,^5, {p, q,g, hi, hi). 

6. For all i e / in which Si misbehaved in either the ZKsetup-DL or ZKproof-DL proto- 
col, i is removed from I. 

V. A = n, ■£/*;• 

8. Each server Si randomly chooses an additive share of the secret key and a companion 
(■SiA; Zq) and broadcasts the corresponding check share (g^'A'-). 

9. The servers perform a 2sum-to-2poly to construct a shared representation of the secret 
^ = Z;=l 

10. The servers perform 2poly-to-lsum, and construct the public key y = g^ from the product 
of the resulting check shares 



Fig. 10. DL-based Distributed Key Generation Protocol 



4.5 DL-based proactive maintenance 

For DL-based proactive maintenance, we perform an update by running 2poly-to-2sum 
on the secret polynomials, and then 2sum-to-2poly. After 2sum-to-2poly, each server 
erases all previous share information, leaving just the new polynomial shares. If there 
is misbehavior by a server in either protocol, the procedure is restarted with new partic- 
ipants (here restarts do not introduce statistical biases and do not reduce the protocol’s 
security). The protocol is given in Figure 11. 

5 Conclusion 



To summarize, we have provided protocols for distributed public-key systems that are 
adaptively secure. Our techniques and protocols are efficient and typically take con- 
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1. Initial configuration: (t,/)-US-VSS (parameters: (/>, A)) with t-degree polynomials 

fl() and a^Q 

2. Each server maintains a list ^ of server indices for servers that have not misbehaved (i.e., 
they are considered good). 

3. A set A C G with |A| = t + 1 is chosen in some public way. 

4. 2poly-to-2sum is run. If there are misbehaving servers, their indices are removed from 
and the protocol loops to Step 3 . 

5. 2sum-to-2poly is run. If there are misbehaving servers (among A), their indices are re- 
moved from § and the protocol loops to Step 3 . 

6. All previous share information is erased. 



Fig. 11. DL-based Proactive Maintenance (Key Update) Protocol 



stant communication rounds when there are no faults (and a fault may cause a constant 

delay). 
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A Basics of DL-based and RSA-based systems 

A.l Basics for DL-based systems 

In these systems, we assume that p and q are two primes such that p = mq + 1 for 
some small integer m, such as 2 or 4, and that g and h are elements of Z* of order q, 
so g^ = = 1 mod p. El-Gamal public -key system and various signatures have been 

developed based on the intraetability of computing discrete logs, which formally is 
stated as follows: 

DLP Assumption Let k be the seeurity parameter. The DLP assumption is as follows. 
Given primes p and q as diseussed above with \p\ = k, and given an element g e 
Z* of order q, and the group Gg generated by g in Z*, for any polynomial-time 
algorithm Pr[g* =y mod p : y £r Gg,x ^ A{l^,p,q,g,y)] is negligible. 

We use various sharing techniques. We assume the reader is familiar with Shamir 
(i,/) -threshold polynomial secret sharing [46]. We will use the polynomial interpolation 
formula explicitly, so we will describe it here. For a t-degree polynomial v{x), and a set 
A = {ii, . . . , it+\} of size t + 1, v(0) can be eomputed using polynomial interpolation. 
Define z,_a = riy6A\{;}('-7)^H0-7)- Then v(0) = I,gAv(/)z!,A- 
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We now describe an Unconditionally-Secure (i,/)-VSS ((i,/)-US-VSS) due to Ped- 
ersen [41] where secrets are drawn from Zq and verification shares are eomputed in Zp. 
We assume the servers do not know the discrete log of h relative to g (this can be assured 
via proper initialization). The protocol begins with two (t, /(-threshold polynomial se- 
cret sharings, sharing secrets s,s' <E Zq. Let a{x) = Zy=o be the random polynomial 
used in sharing 5 and let d{x) = be the random polynomial used in sharing 

s' . For all i, Si receives shares Sj = a{i) and s'j = a'{i). (We refer to the pair {a{i),d{i)) 
as dual-share i.) Also, the verification shares {oCy(= ^'h‘^')}o<j<t, are published.^ Say 
check share Ai = riy=oOC(^ Si can verify the correctness of his shares by checking that 

Ai = gf'h^'i. Say s and s' are the shares computed using Lagrange interpolation from a set 
of t + 1 shares that passed the verification step. If the dealer can reveal different secrets 
s and s' that also correspond to the zero coefficient verification share, then the dealer 
can compute a discrete log of h relative to g. 

A.l Basics for RSA-based systems 

RSA-based systems rely on the intractability of computing RSA inverses, and hence, the 
intractability of factoring products of two large primes. Let k be the security parameter. 
Let key generator GE define a family of RSA functions to be {e,d,N) ^ such 

that A is a composite number N = P* Q where P,Q are prime numbers of kjl bits 
each. The exponent e and modulus A are made public while d = mod X(N) is kept 
private.^ The RSA encryption fnnction is public, defined for each message M£Zfj as: 
C = C{M) = M‘‘ mod A. The RSA decryption fnnction (also called signature function) 
is the inverse: M = mod A. It can be performed by the owner of the private key d. 
Formally the RSA Assumption is stated as follows. 

RSA Assumption Let k be the security parameter. Let key generator GE define a fam- 
ily of RSA functions (i.e., {e,d,N) ^ GE{l'‘) is an RSA instance with security pa- 
rameter k). For any probabilistic polynomial-time algorithm A, Pr[w® = w mod A : 
{e,d,N) ^ GE{l'‘);w Er {0, 1}^;m ^^(l^,w,e,A)j is negligible. 

Next we describe variants of Shamir secret sharing and Pedersen VSS that we use 
in RSA-based systems. They differ in thaf operations on the shares are performed over 
the integers, instead of in a modular subgroup of integers. 

(/, /(-secret sharing over the integers (INT-(f, /(-SS) [23] This is a variant of Shamir 
secret sharing [46]. Let L = l\ and let »? be a positive integer. For sharing a secret 
S' G [0, mK] (and K the size of an interval over the integers), a random polynomial a{x) = 
Ylj=oajX-^ is chosen such that ao = L^s, and each other aj £r {0,L,2L,. .. 

^ In DL-based systems, we implicitly assume all verification operations are performed in Z*. 

^ L(A( = 1cm (P — 1 , g — 1 ( is the smallest integer such that any element in raised by X{N) 
is the identity element. RSA is typically defined using (|)(A), the number of elements in Z’^, 
but it is easy to see that L(A( can be used instead. We use it because it gives an explicit way 
to describe an element of maximal order in Z’^. Note that (|)(A) is a multiple of L(A(, and that 
knowing any value which is a multiple of X{N) implies breaking the system. 

^ We note that in our RSA-based systems, L^s is actually the secret component of the RSA 
secret key, which when added to a public leftover component (in [0,L^ — Ij), forms the RSA 
secret key. 
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Each shareholder i G { 1 , . . . , receives a secret share Si = a{i), and verifies^ that (1) 
0 < Si < and (2) L divides Si. Any set A of cardinality t +l can compute s 

using Lagrange interpolation. 

Unconditionally-Secure (t,/)-VSS over the Integers (INT-(t,/)-US-VSS) 

This is a variant of Pedersen Unconditionally-Secure (t,/)-VSS [41], and is slightly 
different than the version in [26]. Let N be an RSA modulus and let g and h be genera- 
tors whose discrete log modulo N with respect to each other is unknown. The protocol 
begins with two (i,/) -secret sharings over the integers, the first sharing secret s with 
m = \, and the second sharing s' with m = NK. Note that s € and s' G [0, Let 
a{x) = l!j=o OjX-' be the random polynomial used in sharing 5 and let d{x) = Zy=o 
be the random polynomial used in sharing s' . For all i. Si receives shares Si = a{i) and 
s'j = a'{i) . (We refer to the pair (a(i) , d (i) ) as dual-share i.) Also, the verification shares 
{tty(= are published.® Say check share Ai = can verify the 

correctness of his shares by checking that Ai = Say s and s' are the shares com- 
puted using Lagrange interpolation from a set of i + 1 shares that passed the verification 
step. If the dealer can reveal different secrets s and s' that also correspond to the zero 
coefficient verification share, then the dealer can compute an a and P such that g“ = h^, 
which implies factoring (and thus breaking the RSA assumption). 

Looking ahead, we will need to simulate an INT-(t,/)-US-VSS. Using Lemma 8, 
we can do this by constructing a random polynomial over an appropriate simulated 
secret (e.g., a random secret, or a secret obtained as a result of a previously simulated 
protocol) in the zero coefficient, and a random companion polynomial with a totally 
random zero coefficient. Note that the P value in the lemma will correspond to K, and 
the y value in the lemma will correspond to the discrete log of g with respect to h, which 
is less than N. The probability of distinguishing a real VSS from the simulated VSS will 
be {At + 2)/K, which is exponentially small if the range of secrets K is exponentially 
large. 



B Distributed Public-Key Systems - Formal Definitions 

Definition 5. (Robnstness of a Threshold System) A {t,l)-threshold public-key sys- 
tem S is robust if for any polynomial-time t-restricted adaptive stationary adversary Si, 
with all but negligible probability, for each input m which is submitted to the DistApply 
protocol the resulting outputs passes Verify (ym,s). 



Definition 6. (Secnrity of a Threshold System) A {t,l)-threshold public-key system 
S is secure if for any polynomial-time t-restricted adaptive stationary adversary Si, 
after polynomially-many DistApply protocols performed during operational periods 
on given values, given a new value m and the view of Si, the probability of being able 
to produce an output s that passes VERIFY (y,m,s) is negligible. 

® These tests only verify the shares are of the eorrect form, not that they are correct polynomial 
shares. 

® In RSA-based systems, we implicitly assume all verification operations are performed in Zf. 
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Remark: The choiee of the inputs to DistApply prior to the challenge m defines 
the tampering power of the adversary (i.e., “known message,” “chosen message”, “ran- 
dom message” attacks). The choice depends on the implementation within which the 
distributed system is embedded. In this work, we assume that the (centralized) cryp- 
tographic function is secure with respect to the tampering power of the adversary. We 
note that the provably secure signature and encryption schemes typically activate the 
cryptographic function on random values (decoupled from the message choice of the 
adversary). 

For the definitions of security and robustness properties of a distributed key gen- 
eration and proactive maintenance of a cryptosystem, see [33]. 



C ZK proofs 

We use efficient ZK proofs of knowledge (POKs) derived from [26] and [10]. These 
are composed of combinations of Z-protocols [9] (i.e., Schnorr-type proofs [45]). For 
each ZK proof that we need, we will have a separate “proof’ protocol, but there will be a 
single “setup” protocol used for all ZK proofs. Say A wishes to prove knowledge of “W” 
to B. Then the setup protocol will consist of B making a commitment and proving that he 
can open it in a witness indistinguishable way [22], and the proof protocol will consist 
of ^ proving to B either the knowledge of “W” or that A can open the commitment. (See 
[10] for details) This construction allows the proof protocols to be run concurrently 
without any timing constraints, as long as they are run after all the setup protocols 
have completed. (For more on the problems encountered with concurrent ZK proofs 
see [37, 19, 20].) 

The DL-based and RSA-based ZK-proof-setup protocols are exactly the Z-protocols 
for commitments over ^-one-way-group-homomorphisms (^-OWGH), given in [10]. 
Recall the ^-OWGH for a DL-based system with parameters {p,q,g) is f{x) = g* mod 
p, and the ^-OWGH for an RSA-based system with parameters {N, e) is f{x) = x^ mod 
N (with ^ = e in this case). 

Let KE denote the “knowledge error” of a POK. 

Formally, we define ZKSETUP-DL^_B(/?,^,g,/!) as a protocol in which A generates 
a commitment C and engages 5 in a witness-hiding (WH) POK {KE =\ jq) of O, o' ^Zq 
where C = g^hP' mod p. 

We define ZKSETUP-RSA^_B(A,e,g) as a protocol in which ^ generates a commit- 
ment C and engages 5 in a WH POK {KE = 1/e)^ of {o,o') (with a € Zg, o' € ZJ(r) 
where C = g'^(o')® mod N. 

We define ZK?ROOF-DLAfi{p, q,g,h,D) as a protocol in which A engages 5 in a 
WH POK {KE = 1 /^) of either d £ Zq where D = g^ mod p, or T, x' € Zq where Cba = 
g^h!^' mod p and Cba ^^e commitment generated in ZKSETUP-DL5^(j5,^,g,/!). 

We define ZKPROOF-DL-REP^_g(jj,^,g,/!,ii,D) as a protocol in which ^ engages 
R in a WH POK {KE = \/q) of either d,d' € Zq where D = mod p and E = 

^ This implies e must be exponentially large in the security parameter k in order to obtain a 
sound proof. However, if e is small (say e = 3) we can use different setup and proof protocols 
described in [10] to obtain provably secure and robust RSA-based protocols. 
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mod p, or X , V e Zq where Cb.a = mod p and Cb.a is the commitment generated 
inZKSETUP-DL5,^(/?,^,g,/!). 

We define* ZKPROOF-IF-REP^_B(A^,e,»j,g,/!,£’,Z)) as a protocol in which A, who 
knows integers d G (—a, a] and d' G {—b, b] such that E = m‘^ mod N and D = g‘^h‘^' mod 
A^, engages R in a WH POK (W_E = 1/e) of either A G Z^, 5 G (— 2ae(A^+ l),2ae(A^+ 1)], 
and 5' G {—2be{N + l),2be{N +1)] where = g^h^' mod N and = g® mod N, 
or (x,xO (with X G Ze, %' G Z^) where Cb.a = g^{'^Y mod N and Cb,a is the commit- 
ment generated in ZKSETUP-RSA 5 ^(A^,e,g). This protocol is honest-verifier statisti- 
cal zero-knowledge with a statistical difference between the distribution of views pro- 
duced by the simulator and in the real protocol bounded hy 2 /N. 

C.l Proof of representations 

Here we give the main Z-protocol used in ZKPR 00 F-lF-REP^_ 5 (A,e,»j,g,/!,ii,£)).^ 

1. Initially, the parameters {N,e,m,g,h,E ,D) are public, and A knows integers d G 
{—a, a] and d' G {—b,b] such that E = nY mod N and D = g^h‘^ mod N. 

2. A generates r Gi? {—aeN,aeN] and / G {—beN,beN], computes V ^rrf mod N and 
W = g^’U' mod A, and sends V, W to B. 

3. B generates c £r Zg and sends c to 

4. A computes z = cd+r and z' = cd! + /, and sends z,z' to B. 

5. B checks that nf = E‘^V mod N and = D^W mod N. 

In all steps, A and B also check that the values received are in the appropriate ranges. 

The above is aPOKof A G Z^, 8 G {—2ae(N+ l),2ae{N+ 1)], and 8' G {—2be(N+ 
\),2be{N+\)\ in which nY = E^ mod N and mod N. The knowledge error 

is 1 /e, and the protocol is honest- verifier statistical zero-knowledge, with a statisti- 
cal difference between views produced by the simulator and those in the real protocol 
bounded by 2/W 

D Proofs of Techniques 

The following lemma from [23] is used to prove the subsequent lemma, which in turn 
is used to prove the simulatability of INT-(i,/)-US-VSS. 

Lemma 7. Let r{x) = ro + ri v H h rtx? be a random polynomial of degree t sueh that 

r(0) = ro = L^k (k G [0,Wp and rj Gi? {0,T, . . . , ^L^K} for I < j <t. Let A' be a set 
of t servers. Then with probability at least 1 — 2t/^, for any k G [0,W], there exists a 

polynomial r'{x) = r'f^ + r'jV H h r[f with r'(0) = ?*()= L^k and r'- G {0,T, . . . , ^L^K} 

for \<j< t such that r[i) = r\i) for i G A!. 



* IF stands for “integer factorization.” 

^ Recall that this main protocol is combined with a Z-protocol proving knowledge of a commit- 
ment generated in a setup protocol, using an “OR” constmction. 
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Lemma 8. Let y € Let r{x) = ro + r\x + h rtx‘ be a random polynomial of 

degree t such that r(0) = ro = L^k (k e [0,ATp and rj {0,/,, . . . , ^L^K} for 

Let f (x) = rg + /jX+ h r^V be a random polynomial of degree t such that r'(0) 

• = L?k! (k' G [0,ffjKpp and rj {0,/,, . . . , ^L?mK"\ for \ < j <t. Let A' be a set of 
t servers. Then with probability at least 1 — (4t + 2) /p, for any k G [0, AT], there exists 

polynomials r{x) = ro + r\x H h Ptx‘ with r(0) = ro = L^k and rj G {0,/,, . . . , ^L^K} 

far 1 < 7 < ^ and r\x) = rg -|- fjXH hr^V with r\f) = L^{jk + k' —jk), 0 < r^(0) < 

^L?mK and r'- G {0,/,, . . . , ^L^mK\ for I < j <t, such that r[i) = r[i) and rfi) = r'{i) 
for i G A', and y{x) + rfx) = jr{x) + r'{x). 

Proof. Except for the last equation, we get from Lemma 7 that the probability that the 
polynomials r{x) and f'(x) (with coefficients in the correct ranges) do not exist is at 
most [2/P] -b [2t/P] -b [2t/P], where the first 2/P arises from the probability that r'(0) 
is in the correct range, given an additive offset of L^{jk — jk) G [—L^mK,L^mK] from 
L^k! . If those polynomials do exist, then the last equation follows since (1) the degree 
of the polynomial on each side of the equivalence is t, (2) the polynomials obviously 
agree at the t locations in A! , and (3) the polynomials agree at 0, since Jr ft) + /(O) = 
L?-jk + L?k' = L'^jk + L^{jk + k' — jk) = 7f(0) + r'(0). 

D.l Useful RSA Lemmas 

Lemma 9 ([26]). Let h be the security parameter. Let modulus generator GE define a 
family of modulus generating functions (i.e., N ^ GE{\^) be an RSA modulus with se- 
curity parameter h). For any probabilistic polynomial-time adversary JA, the following 
is negligible: Pr[w‘^ = w'^ mod N-,{efQ)y{dfQ)\N^ GE{\^)-,u,w ^r {0, l}*;e,<7^ 
^(1 *,w,m)] 

Proof. Similar to [2]. 

The following corollary follows from Lemma 9 and the RSA assumption (and hence 
from the RSA assumption). 

Corollary 10. Let k be the security parameter. Let GE be an RSA generator (that pro- 
duces large public exponents), i.e., {N,e) ^ GE(\’‘). For any probabilistic polynomial- 
time algorithm A, 

Pr[(g“ = h^ mod N\ (a 0) V (P 0)) V (g = m®) : 

(A,e) ^ GE{fa)-g,heR {0, 1}^ (a,p,M) ^A{\\g,h)] 



is negligible. 

E Proofs of Protocols 

Proof, of Theorem 1 

We prove the robustness and security of the Basic DL-based threshold protocol. Both 
are based on the DLP Assumption. Recall that we assume the adversary is stationary 
and adaptive. 
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Robustness Say P{k) is a polynomial bound on the number of messages m that the 
protocol signs. Say an adversary prevents the signing of a message with non-negligible 
probability p. We will show how to solve the DLP with probability 

tP(k) + 1 

P • 

q 

Say we are given an instance of the DLP, namely for a given set of parameters 
{p,q,g), we are given a uniformly chosen value h € Z*. We create a simulator that 
runs the dealer as normal, except that h is taken from the DLP instance (when h is 
distributedly generated, we were able to inject our DLP instance as well). Then the 
servers are run as normal, except that the extractor for the ZKproof-DL-REP protocol 
is run whenever an incorrupted server is playing the verifier and a corrupted server is 
playing the proven Note that since the simulator knows the secrets x,x', the normal 
operation of the servers can be simulated easily. We will show that if an adversary is 
able to prevent a message from being signed, we can (except with negligible probability) 
determine P>L{h,g). 

If a server is not corrupted, the probability of a failed extraction using that server is 
l/q. There will obviously be at least one incorrupted server that runs the extractor with 
every corrupted server, and thus the probability of any corrupted server not allowing a 
successful extraction during the protocol is at most tP{k)/q. Say Si runs the extractor 
successfully on Sj. If Si extracts a way to open the commitment Cij = g^'Jh 'J (from 
the setup protocol), say with {%ij,'^i y), then except with probability 1 / q, this will give 



DL(/.,g) = 

ij ^ij 



Therefore, with probability at most 



tP{k) + 1 

q 



there was either an extraction that failed or an extraction that succeeded, but produced 
a way to open Cij with the same pair of values used to create it. 

Now say the adversary prevents a message m from being signed. It should be clear 
that after at most i + 1 attempts, there will be a set A of i + 1 servers that participate 
in signing m without any verification failures. This implies that the signature obtained 
must be incorrect. Let be the check shares for the (single) additive shares in this 

signing attempt. If then the signature must be correct, so we may assume 

WieK^i 7^ y. Let {(6,',69};eA be the extracted (or simulator generated, for incorrupted 
servers) dual-shares. It is easy to see that 



^Y[Bi = Y[Di = Y[gH^'i. 

i<EA ;eA ieA 



We also have g* = y UieAEi = IlieAg^®' - But then 

= gL/GA5,7,Z,eA5;^ 
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with X ^ ZieA S; (and hence a' ^ Z/eA and thus 

DL(/t,g) = f mod 

X-I^ieAOj 

Therefore, with probability p — {tP{k) +\)/q, DL(/;,g) can be found. 

Security Here we show that if the adversary can sign a new message, then it can break 
the security of the (non-distributed) signature scheme w.r.t. the same attack [33], 

Say an adversary can sign a new message in the Basic DL-threshold Protocol with 
non-negligible probability p. We will show that we can sign a new message in the un- 
derlying signature scheme with probability p. Say the signature scheme has parameters 
{p,q,g,y)- Then we create the following simulator: 

1. Initialization: The signature scheme gives parameters {p,q,g,y) 

2. Simulate the dealer by generating h Z*, x' £r Zq, and producing a (i, /)-US-VSS 
with polynomials (fl(),a'()) on secrets 0,x' (i.e., d(0) = 0 and o'(0) =x'). 

3. Each (ordered) pair of servers performs the ZKSETUP-DL protocol, using g and 
h as the generators, except that an incorrupted verifier interacting with a corrupted 
prover uses the extractor to determine how to open the commitment for that proven 

4. Each server maintains a list Q of server indices for servers that have not misbehaved 
(i.e., they are considered good). 

5. When a message m needs to be signed, the following DistApply protocol is run: 

(a) A set A C G with |A| = i + 1 is chosen in some public way. 

(b) 2poly-to-2sum is performed using the simulator-generated (t, /)-US-VSS with 
polynomials (fl(),a'()), producing values {bj,b'j) for j e A, along with their 
associated check shares. If there are misbehaving servers, their indices are re- 
moved from Q and the protocol loops to Step 5a. 

(c) A faking server, say Si is picked at random from A. 2sum-to-2sum is performed 
using the simulator-generated values {(by,b' )}ygA and their associated check 
shares from the previous step, producing values {dj,dj) for j e A, along with 
their associated check shares. If there are misbehaving servers, their indices are 
removed from Q and the protocol loops to Step 5a. If Si is compromised, then 
the simulation rewinds to Step 5 c, and is attempted again. 

(d) For each j £ A\ {/}, Sj performs 2sum-to-lsum using the simulator-generated 
values {dj,dj) and their associated check shares from the previous step. Si, 

however, produces Ei =yg^J and in each ZKproof-DL-REP actually proves 
knowledge of how to open the commitment, instead of knowledge of the dis- 
crete log of Ei. If there are misbehaving servers, their indices are removed from 
^ and the protocol loops to Step 5a. If Si is compromised, then the simulation 
rewinds to Step 5c, and is attempted again. 

(e) The additive application step of the signature protocol is run, but with Si sim- 
ulated by using the signature on m obtained from the signature oracle. If there 
are misbehaving servers, their indices are removed from ty and the protocol 
loops to Step 5a. If Si is compromised, then the simulation rewinds to Step 5c, 
and is attempted again. 
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Note that the probability of rewinding is at most t /{t + \), and thus the simulator 
requires on average a factor of at most t + 1 more time than the real protocol. Thus the 
simulation is polynomial time. The simulation is perfect if all the extractors succeed. 
Since the extractors have to be run at most It times, and the probability of a single 
extractor failing is l/q, the probability of distinguishing a simulated view from a real 
view is It / q. Thus the simulation is statistically indistinguishable from the real protocol. 

If the adversary is able to then generate a new signature, then it is clear that we 
would have an algorithm to break the signature scheme. 

Proof, of Theorem 2 

We prove the robustness and security of the Basic RSA-based threshold protocol. Both 
are based on Corollary 10. (We will call this the “Corollary 10 assumption.”) Recall 
that we assume the adversary is stationary and adaptive. We will assume that the public 
key e is large {Q{k) bits). (For small e we can use a technique from [10] to obtain ZK 
proofs that allow us to prove similar results.) 



Robustness Say P{k) is a polynomial bound on the number of messages m that the 
protocol signs. Say an adversary prevents the signing of a message with non-negligible 
probability p. We will show how to break the Corollary 10 assumption with probability 

2tP(k) + 1 

P 

Say an RSA public key was generated {N,e) *r- GE{\^) and we are given a uni- 
formly chosen g,h ^ Zf, as in Corollary 10. We use the simulator that is used to prove 
security, except that g,h are taken from the RSA instance, and that the extractor for 
the ZKpR 00F-1F-REP protocol is run whenever an incorrupted server is playing the 
verifier and a corrupted server is playing the proven We will show that if an adver- 
sary is able to prevent a message from being signed, we can (except with negligible 
probability) either find a, P such that g“ = mod N or find u such that = g mod N. 

If a server is not corrupted, the probability of a failed extraction using that server is 
\/e. There will obviously be at least one incorrupted server that runs the extractor with 
every corrupted server, and thus the probability of any corrupted server not allowing a 
successful extraction during the protocol is at most tP(k)je. Say Si runs the extractor 
successfully on Sj. If 5; extracts a way to open the commitment = g^'’’{o'jjY, say 
with {Xij,fj), then this will give 

g^‘’i{o'ijy = g^‘’ffijY, 

and thus 

gOij-rij ^ 

Except with probability 1/e, gcd(a,-,y — Tij,e) = 1, so using the Extended Euclidean 
algorithm one can compute a, P such that ae + 1 = P(o, — Xij), and thus 
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Therefore, assuming the adversary has not distinguished the simulation from the 
real protocol, with probability at most 



tP{k) + 1 

1 

e 

there was either an extraction that failed or an extraction that succeeded, but produced 
a way to open Cij with an RSA-REP pair that did not allow one to compute the RSA 
inverse of g. Recall that the probability the adversary can distinguish the simulation 
from the real protocol is at most 

At + 2 ^ tl 
N ^ e' 

Now say the adversary prevents a message m from being signed. It should be clear 
that after at most t + 1 attempts, there will be a set A of t + 1 servers that partic- 
ipate in signing m without any verification failures. This implies that the signature 
obtained must be incorrect. Let be the check shares for the (single) addi- 

tive shares in this signing attempt for message m. (Recall that for the faking server 
Si, Ei = m‘^' mod A.) Then nP* ^ Let {(6y,6y,Ay)}ygA be the 

extracted (or previously known to the simulator, for incorrupted servers) “dual shares,” 
along with the exponent on Dj. (For faking server Si, 5, = c/,, 8; = d\ and A, = 1.) Let 
A = b is easy to see that 



V/eA / V/eA / yeA 



We also have 












But then 

/jTA _ g.S/GA(5;A/Ay)^y./GA(5yA/A/)^ 

with 0 ^ 2^y£y^(8yA/Ay) (and hence x' ^ 2^ygy^(8yA/A/)), and thus 



^ ^TA-yy£A5)A/A,-^ 



Thus, assuming the adversary can prevent a message from being signed, the proba- 
bility of finding either a, P such that g“ = mod N or u such that =g mod N is at 
least the non-negligible probability 

tP(k) + \+tl At + 2 

P ^ + 



contradicting Corollary 10. 
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Security Here we reduce the security of our Basic RSA-threshold protocol to the RSA 
assumption. Say an adversary, after watching polynomially many messages be signed 
in the Basic RSA-threshold protocol, can sign a new challenge message, with non- 
negligible probability p. Then we will give a polynomial-time algorithm to break RSA 
with probability close to p. 

Say we are given an RSA key {N,e) and a challenge message m* to be signed. 
We will run the adversary against a simulation of the protocol, and then present m* 
to be signed. We will show that the probability that an adversary can distinguish the 
simulation from the real protocol is negligible, and thus the probability that it signs m* 
is negligibly less than p. 

The simulator is as follows: 

1. Initialization: The RSA parameters {N,e) are given. We may also assume that we 
have a list of random message signature pairs 

2. Simulate the dealer by computing the public value x* (using public values N,e,L, 
as in the real protocol) generating g, h <Er Z'^, x! <Er Z^3 , and producing a INT- (t, /) - 
US-VSS with polynomials (d(),d'Q) on secrets 0,x' (i.e., d(0) = 0 and d'(0) =x'). 

3. Each (ordered) pair of servers performs the ZKSETUP-RSA protocol, using g and 
h as the generators, except that an incorrupted verifier interacting with a corrupted 
prover uses the extractor to determine how to open the commitment for that proven 

4. Each server maintains a list tj of server indices for servers that have not misbehaved 
(i.e., they are considered good). 

5. When a message m needs to be signed, the following DistApply protocol is run: 

(a) A set A C G with |A| = t + 1 is chosen in some public way. 

(b) 2poly-to-2sum is performed using the simulator-generated INT-(t,/)-US-VSS 
with polynomials (d(),d'()), producing values {bj,b'j) for j e A, along with 
their associated check shares. If there are misbehaving servers, their indices 
are removed from Q and the protocol loops to Step 5a. 

(c) A faking server, say Si is picked at random from A. 2sum-to-2sum is performed 
using the simulator-generated values {(^y,^' )}ygA and their associated check 
shares from the previous step, producing values {dj,dj) for j e A, along with 
their associated check shares. If there are misbehaving servers, their indices are 
removed from Q and the protocol loops to Step 5a. If Si is compromised, then 
the simulation rewinds to Step 5 c, and is attempted again. 

(d) For each j <E A\ {/}, Sj performs 2sum-to-lsum using the simulator-generated 
values {dj,dj) and their associated check shares from the previous step. Si, 

however, produces Ei = m^i and in each ZKproof-IF-REP actually 

proves knowledge of how to open the commitment, instead of knowledge of 
the discrete log of Ei. If there are misbehaving servers, their indices are re- 
moved from Q and the protocol loops to Step 5a. If Si is compromised, then 
the simulation rewinds to Step 5 c, and is attempted again. 

Note that the probability of rewinding is at most t /{t + \), and thus the simulator 
requires on average a factor of at most t + 1 more time than the protocol. Thus the 
simulation is polynomial time. The probability of distinguishing the simulation from 
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the real protocol is at most the probability of distinguishing the simulated INT-(t,/)- 
US-VSS from the real one, plus the probability of an extraetor failing. All of this can 
be bounded by (4t + 2)/A+ tl/e. Thus with probability negligibly less than p, we ean 
generate a signature on m*. 
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Computer industry announces that a gigahertz processor will be available around 
the year 2001. This is a chip which runs with a cycle time of one nanosecond which is 
not only a great challenge in technology, but even a greater challenge for application 
of mathematics in VLSI design. The timing graph of a microprocessor is a directed 
graph with several million edges, where each edge models the signal processing through 
combinatorial logic between two latches (registers). Latches itself are governed by clock 
signals. If the total travel time of a signal along one edge is at most one nanosecond, 
deviation of even a few picoseconds due to design errors, technology, production etc. 
matter substantially. 

This talk gives an overview of most recent approaches of discrete optimization to 
VLSI-design. More specifically, we describe methods to minimize the cycle time, i.e. 
the life span of a bit in a computer. 

We have modeled the minimization problem of the cycle time of a microprocessor 
as an extended maximum mean weight cycle problem in a graph. By this, we are able to 
minimize the cycle time (or to maximize the frequency). Moreover, we can extend the 
mean weight cycle model in such a way that process variations, clock jitters, balancing 
problems, early mode problems and other additional constraints can be handled, too. 

By this approach the global cycle time can be minimized under all constraints which 
are logically and technically given. It turned out that without this application of combi- 
natorial optimization industry would be unable to produce microprocessors with a cycle 
time of one nanosecond. 
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Abstract. The traveling purchaser problem is a generalization of the traveling 
salesman problem with applications in a wide range of areas including network 
design and scheduling. The input consists of a set of markets and a set of products. 
Each market offers a price for each product and there is a cost associated with 
traveling from one market to another. The problem is to purchase all products by 
visiting a subset of the markets in a tour such that the total travel and purchase 
costs are minimized. This problem includes many well-known NP-hard problems 
such as uncapacitated facility location, set cover and group Steiner tree problems 
as its special cases. 

We give an approximation algorithm with a poly-logarithmic worst-case ratio for 
the traveling purchaser problem with metric travel costs. For a special case of the 
problem that models the ring-star network design problem, we give a constant- 
factor approximation algorithm. Our algorithms are based on rounding LP relax- 
ation solutions. 



1 Introduction 

Problem. The traveling purchaser problem (TPP), originally proposed by Ramesh 
[Ram 81], is a generalization of the traveling salesman problem (TSP). The problem 
can be stated as follows. We are given a set M = {1, . . . , m} of markets and a set A = 
of products. Also, we are given c,y, cost of travel from market city i to city 
j, and nonnegative dij, the cost of product i at market j. A purchaser starts from his 
home city, say city 1 , and travels to a subset of the m cities and purchases each of the 
n products in one of the cities he visits, and returns back to his home city. The problem 
is to find a tour for the purchaser such that the sum of the travel and purchase costs is 
minimized. It is assumed that each product is available in at least one market city. If a 
product i is not available at market j, then dij is set to a high value. 

Applications. The traveling purchaser problem has applications in many areas in- 
cluding parts procurement in manufacturing facilities, warehousing, transportation, te- 
lecommunication network design and scheduling. An interesting scheduling application 
involves sequencing n jobs on a machine that has m states [Ong 82]. There is a set-up 
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cost of Cij to change the state of the machine from i to j. A cost dij is specified to pro- 
cess job i at state j. The objective is to minimize the sum of machine set-up and job 
processing costs. 

The traveling purchaser problem contains the TSP, the prize collecting TSP, unca- 
pacitated facility location problem, group Steiner tree problem and the set cover prob- 
lem as its immediate special cases. The TSP is the case when each market city has a 
product available only at that city. In the uncapacitated facility location problem, let the 
fixed cost for opening facility j be fj and the cost of servicing client i by facility j be 
dij. Then the problem is equivalent to a TPP with a market for each facility and a prod- 
uct for each client, where the travel cost between markets i and j is Cij = (ft + fj) /2 and 
the purchase cost of product i at market j is dij. In the set cover problem, we are given 
a set S and subsets Si,. ..,S„ C S. The problem is to find a minimum size collection of 
subsets whose union gives S. This corresponds to a TPP where S is the set of products 
and there is a market j for each subset Sj. The cost of purchasing product i at market 
j (of Sj) is zero if i € Sj and is a large number otherwise. There is a unit cost of travel 
between each market. Then, there is a set cover of size k if and only if there is a TPP 
solution of cost k. 

Hardness. Note that since there is no polynomial time approximation algorithm 
for the general TSP, TPP with no assumptions on the costs cannot be approximated in 
polynomial time unless P = NP [GJ 79]. The TPP instance into which we reduce the 
set cover problem has metric travel costs. Therefore, from the above approximation- 
preserving reduction and current hardness results for set cover [F 96, RS 97, AS 97] 
it follows that there is no polynomial time approximation algorithm for the traveling 
purchaser problem even with metric travel costs whose performance ratio is better than 
(1 — o(l)) Inn unless P = NP. 

Related Work. Due to the hardness of the problem, many researchers have focused 
on developing heuristics. Most of these algorithms are local search heuristics (Golden, 
Levy and Dahl [GLD 81], Ong [Ong 82], Pearn and Chien [PC 98]). Voss [V 96] gen- 
erated solutions by tabu search. The exact solution methods are limited to the branch- 
and-bound algorithm of Singh and van Oudheusden [SvO 97], which solves relaxations 
in the form of the uncapacitated facility location problem. 

Our Results. We give the first approximation results for the traveling purchaser 
problem. We give an approximation algorithm with a poly-logarithmic worst-case ratio 
for the TPP problem with metric travel costs (Corollary 6). In fact, this algorithm ap- 
proximates a more general bicriteria version of the problem (Theorem 7). For a special 
case of the TPP problem that models the ring-star network design problem with pro- 
portional costs, we give a constant-factor approximation algorithm (Theorem 13 and 
Corollary 14). 



2 Bicriteria Traveling Purchaser Problem 

We consider a bicriteria version of the traveling purchaser problem, where minimizing 
the purchase costs and the travel costs are two separate objectives. The bicriteria prob- 
lem is a generalization of the TPP, whose solutions provide the decision-maker insight 
into the tradeoffs between the two objectives. 
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We use the framework due to Marathe et al. [MRS+ 95] for approximating a bicri- 
teria problem. We choose one of the criteria as the objective and bound the value of 
the other by a budget constraint. Suppose we want to minimize objectives A and B. We 
consider the problem 

P : mmB s.t. A < a 

Definition 1. An (a, ^)-approximation algorithm for the problem P outputs a solution 
with A-cost at most a times the budget a and B-cost at most p times the optimum value 
of P, where a, P > 1. 

Our approximation algorithm rounds an LP relaxation solution. It uses the “filter- 
ing” technique of Lin and Vitter [LV 92] to obtain a solution feasible to the LP relax- 
ation of a closely related Group Steiner Tree (GST) problem. Then, the LP rounding 
algorithm of Garg, Konjevod and Ravi [GKR 97] is utilized to obtain a feasible solu- 
tion. 

2.1 Formulation: 

We represent the bicriteria TPP as the problem of minimizing the travel costs subject 
to a budget D on the purchasing costs. The following IP formulation is a relaxation of 
the TPP problem, where the market cities that the purchaser visits are connected by a 2- 
edge-connected subgraph instead of a tour. In the formulation, the variable indicates 
whether product i is purchased at market j, and variable Zjk indicates whether markets 
j and k are connected by an edge of the 2-connected subgraph. 

min X CjkZjk 
j,keM 

St 

n m 

Z Z dijXij < D (1) 

!= 1;'=1 

m 

X Xij = 1 ieN (2) 

;=i 

X^u + 2 2: 2;T>1 ieN,ScM,liS{3) 

j^S 

x,/€{o,i} ieNjeM (4) 

Zjk e {0,1} j,keM (5) 

Constraint (1) is the budget constraint on purchase cost. Constraints (2) enforce that 
each product is purchased. Constraint set (3) is intended to capture the requirement 
of crossing certain cuts in the graph by edges in the subgraph that connect the visited 
markets. Consider a set of markets S not including the traveler’s start node 1, and a 
particular product i: Either i is purchased at a market not in S or the 2-edge-connected 
subgraph containing 1 must contain at least one market in S from where i is purchased, 
thus crossing at least two of the edges in the cut around S. This disjunction is expressed 
by constraints (3). 

The LP relaxation relaxes the integrality of Xij and zjk variables. Although the LP 
has an exponential number of constraints, it can be solved in polynomial time using 
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a separation oracle [GLS 88] based on a minimum cut procedure. To separate a given 
solution {z,x) over constraints (3) for a particular product i, we set up a capacitated 
undirected graph as follows: For every edge {i,j) of the complete graph on the market 
nodes, we assign an edge-capacity ZijI'l. We add a new node p, and assign the capacity 
of the undirected edge between p, and market node j to be A polynomial-time 
procedure to determine the minimum cut separating 1 and p, [AMO 93] can now be 
used to test violation of all constraints of type (3) for product i. Repeating this for every 
product i provides a polynomial-time separation oracle for constraints (3). 

2.2 Filtering 

Let x,£ be an optimal solution to the LP relaxation defined above. By filtering, we limit 
the set of markets a product can be purchased at. For each product, we filter out markets 
that offer a price substantially over the average purchase cost of the product in the LP 
solution. 

Let Di denote the purchase cost of product i in the solution x,z, i.e. D, = Y!J=\ dijXij. 
For a given & > 0, define a group of markets for product i: Gi = {j E M : dij < (1 + 
&)D, }. Every group G, gets at least a certain amount of fractional assignment of product 
i to its markets in the LP solution as shown by the next lemma. 

Lemma 2. For every product i e N and e > 0, 'ZjeGi^ij ^ 

Proof. Suppose for a contradiction that Y^jeGi^ij < TTi- Then, Y.j<^Gi^ij ^ T^- Note 
that Di = Y.jeM dijXij > 'Lj^Gi dijXij > ( 1 + e)D, Xij by the definition of Gi. Since 
Y^j^Q.Xij > -j^, we get the contradiction Di > Di. 



2.3 Transformation to Group Steiner Tree Problem 

For each product we identified a group of markets to purchase the product. We now 
need to select at least one market from each group and connect them by a tour. For 
this, we take advantage of the Group Steiner Tree (GST) problem which can be stated 
as follows. Given an edge-weighted graph with some subsets of vertices specified as 
groups, the problem is to find a minimum weight subtree which contains at least one 
vertex from each group. We assume without loss of generality that node 1 is required 
to be included in the tree. We define the following GST instance. Let G be a complete 
graph on vertex set equal to the market set M. The weight of edge {i,j) is set to Cij 
(note that we assume Cij is metric). Let the G, defined as above for each product i be 
the groups. 

Consider the LP relaxation of this GST problem, which we denote by LP-GST. The 
variables zjk denote whether the edge between j and k is included in the tree. 

min E CjkZjk 
j,keM 

Z Zjk > 1 

jeS,k<^S 
0 < Zjk < 1 



St 



S C M, 1 S and Gi C S for some i (6) 
j,keM (7) 
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The nontrivial constraints (6) enforce that there is a path from node 1 to some node 
in group G,, for every i, in the solution. 

Lemma 3. Let Zjk = Then, z is feasible to LP-GST. 

Proof. Consider S cM containing G, but not city 1. By constraint (2), 

+jI^jeS,k(^sZjk > 1- Also, by Lemma 2. Then, 

k'^jeS,k<^sZjk > 1 - So, we have 'LjeS,k^sZjk > 1- 

Garg, Konjevod and Ravi [GKR 97] gave a randomized approximation algorithm 
that rounds a solution to LP-GST. A de-randomized version can be found in [CCGG 98]. 
Using any of these algorithms to round the solution z provides a tree that includes at 
least one vertex from each group and has cost G(log^ mloglogm) times Y.j,keM ‘^jkZjk- 
We obtain a solution to the TPP as follows. Let T be the tree output by the GST 
rounding algorithm. Let v, be a market in G, included in T. We purchase product i at 
market v,. We duplicate each edge in T and find an Eulerian tour. We obtain a Hamilto- 
nian tour on the markets in T by short-cutting the Eulerian tour. That is, while traversing 
the Eulerian tour, when a node that has already been visited is next, we skip to the next 
unvisited node, say u, and include an edge that connects the current node to u. 

The following lemmas are now immediate. 

Lemma 4. The TPP rounding algorithm outputs a solution with total purchase cost at 
most ( 1 + s) Z;L 1 Z7= 1 dijXij, which is at most ( 1 + e) times the budget D, for any chosen 
e > 0. 

Lemma 5. The TPP rounding algorithm outputs a solution with total travel cost at 

most G((l -f m\og\ogm))Y,j^keMCjkZjh which is at most 

0((H- b)(log^ m log log m)) times the optimal TPP cost, for any chosen & > 0. 

Erom Lemmas 4 and 5 we get the following theorem. 

Theorem 6. The TPP rounding algorithm outputs a ((1 + &), (1 + b)G(log^ m log log m))- 
approximate solution for the bicriteria TPP problem with metric travel costs in polyno- 
mial time, for any e > 0. 

The same analysis gives a poly-logarithmic approximation for the TPP as well, 
where we relax the budget constraint on total purchase cost and add the cost to the 
objective function. 

Corollary 7. For any e > 0, the TPP rounding algorithm finds a solution for the TPP 
with metric travel costs, whose cost is max{(l -f e), (l-f b) 0(log^ m log log m)} times 
the optimal TPP cost in polynomial time. 

We note that the TPP with metric costs can be directly transformed to a group 
Steiner tour problem^ on a metric with m + nm nodes, i.e., one of finding a tour that 
visits at least one node from each group. To construct this metric, we begin with the 

* This is also called the generalized TSP in the literature; see [FGT 97]. 
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original metric c on the market nodes. To each market node, we attach n new nodes 
via “leaf” edges, one for each product - such an edge from market node j to its prod- 
uct node i is assigned cost dij/2. All other edges incident on the new nodes are given 
costs implied by the triangle inequality. All the nodes corresponding to a product i 
specify a group - Thus, there are n groups, each with m nodes. It is now straight- 
forward to verify that any group Steiner tour can be transformed to a solution to the 
original traveling purchaser instance with the same cost. Applying the rounding algo- 
rithms for group Steiner trees and short-cutting the tree obtained to a tour gives a direct 
0(log^ (ffi + nm) loglog(m + nm)) approximation to the metric TPP. 

3 Network Design with Proportional Cost Metrics 

In this section we consider a special case of the traveling purchaser problem, which 
models a telecommunication network design problem. A communication network con- 
sists of several local access network (LANs) that collect traffic of user nodes at the 
switching centers, and a backbone network that routes high- volume traffic among switch- 
ing centers. We model this problem by requiring a ring architecture for the backbone 
network and a star architecture for the LANs. The ring structure is preferred for its 
reliability. Because of the “self-healing” properties associated with SONET rings, ring 
structures promise to be of increasing importance in future telecommunication networks 
([Kli 98]). The formal model follows. 

We are given a graph G=(V, E), with length Ig on edge e. Without loss of generality, 
we use the metric completion of the given graph. That is, length of an edge e is replaced 
by the shortest-path length dg between its endpoints. The problem is to pick a tour (ring 
backbone) on a subset of the nodes and connect the remaining nodes to the tour such 
that sum of the tour cost and the access cost is minimized. The access cost of connecting 
a non-tour node i to a tour node j is dij, i.e. the shortest-path length between i and j. The 
access cost includes the cost of connecting all non-tour nodes to the tour. On the other 
hand, the cost of including an edge e in the tour is pdg, where the constant p > 1 reflects 
the more expensive cost of higher bandwidth connections in the backbone network. 

This problem is a special case of TPP where the vertices of the graph correspond to 
both the set of markets and the set of products [V 90]. With the TPP terminology, the 
purchase cost of a product of node i at the market of node j is the shortest path length 
between nodes i and j. Thus, if node i is included in the tour, its product is purchased 
at its own market at zero cost. We consider a bicriteria version of this problem with the 
two objectives of minimizing the tour cost and minimizing the access cost. We use the 
following notation to denote the problems considered. 

(A, T, p); Minimize tour cost T subject to a budget on the access cost A, where a 
tour edge costs p times the edge length. 

(A + T, p); Minimize sum of the tour and access costs, where a tour edge costs p 
times the edge length. 

3.1 Hardness 

The bicriteria problem (A, T, p) is NP-hard even when p = 1 . When the budget on the 
access cost A is set to zero, the problem reduces to the TSP since every node must be 
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included in the tour. We show that it is NP-hard to approximate this problem with a 
sub-logarithmic performance ratio without violating the budget constraint. This result 
does not follow from the inapproximability of TSP since we assume that the distances 
dij are metric. 

Theorem 8. There exists no {I, a) -approximation algorithm, for any a = o{logn), for 
the {A,T, 1) problem unless P = NP. Here n is the number of nodes in the {A,T, 1) 
instance. 

The proof (omitted) is by an approximation preserving reduction from the connected 
dominating set problem. Note that since (A, T, 1 ) is a special case of (A, T, p), the same 
hardness result holds for (A, T, p). 

Theorem 9. The single criteria problem (A + T, 1 ) is NP-hard. 

The proof (omitted) is by a reduction from the Hamiltonian tour problem in an 
unweighted graph which is known to be NP-hard [GJ 79]. Again, since (A, T, 1) is a 
special case of (A, T, p), NP-hardness of the latter follows as well. 

3.2 Approximation 

There exists a simple 2-approximation algorithm for the (A + T, 1 ) problem. Find a 
minimum spanning tree of G, say MST, duplicate the edges of MST and shortcut this 
to a tour. Note that every node is included in the tour so that the access cost is zero. 
The cost of the tour is at most 2 times the cost of MST, which is a lower bound on the 
optimal cost. 

Note that this heuristic is a 2 p-approximation algorithm for (A + T, p ) . However, we 
obtain a stronger constant factor approximation for both the bicriteria and single objec- 
tive problems for arbitrary p by LP rounding. The LP rounding algorithm uses filtering 
to limit the set of tour nodes a node can be connected to, as in the TPP rounding algo- 
rithm. However, the construction of the tour differs from the TPP rounding algorithm. 
Tour nodes are chosen based on the access costs and the tour is built by shortcutting an 
MST on a graph obtained by contracting balls around the tour nodes. 

We assume that a root node r is required to be included to the tour (this is similar 
to including the home city in the TPP). If no such node is specified, we can run the 
algorithm n times, each time with a different root node, and pick the best solution. We 
use the following relaxation of (A, T, p), which is very similar to the relaxation that we 
used in the TPP rounding algorithm. 

min p Y. deZe 

eeE 
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ieV jeV 
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Variable Xij indicates whether node i is connected to the tour at node j, and variable 
Ze indicates whether edge e is included in the tour. Constraint (1) is the budget con- 
straint on access cost. Here, dij denotes the shortest path length between nodes i and j. 
Constraint (2) ensures that every node has access to the tour. For a node set S excluding 
r, constraint (3) ensures that at least two edges of the cut around S, denoted by 6(5'), is 
included in the tour, if some node has been assigned to access the tour at a node in 5. 
We obtain the LP relaxation (LPR) by relaxing the integrality in constraints (4) and (5). 



We need a few definitions before we describe the algorithm. A ball of radius r 
around a node i is the set of all points in G that are within distance r from i under the 
length function dg on the edges. The ball may include nodes, edges and partial edges as 
illustrated in Figure la. When we contract a ball around a node into a single node, (i) 
we delete edges with both ends in the ball; (ii) we connect the edges with exactly one 
endpoint in the ball to the new node and shorten their length by the length remaining 
in the ball (Figure lb). Let e > 0 and a > 1 be input parameters. The algorithm is as 
follows: 

(1) Solve LPR, let x,z be an optimal solution. 

(2) Let Dj denote the access cost of node i in this solution, i.e. D, = Z;gv dijXij. 

(3) Let Di = {I + e)Di and define a ball 5, around every node i of radius a/),. 

(4) Preprocessing step: remove all balls containing r and connect their centers to r in 
the access network. 

(5) While unprocessed balls remain: 

(5.1) Pick a ball with minimum radius, say Bj;, and mark it as a “tour ball”. 

(5.2) Remove all balls intersecting Bk and mark them as “connected via Bk”. 

(6) Contract each tour ball to a node. Let G' be a complete graph on the contracted 
nodes and r, with edge weights equal to shortest path lengths in G (after contrac- 
tions). 

(7) Find an MST of G' and construct H by replacing edges of the MST by shortest 
paths in G. 

(8) Duplicate edges of H and shortcut them to a tour PT. 




a) A ball around i of radius r 



b) Contraction of the ball 



Fig. 1. The definition and contraction of a ball 
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(9) Uncontract the balls. Construct tour T by connecting the center node i of each ball 
Bj to PT. 

(10) Connect the center of every ball marked “connected via directly to k in the 
aecess network. 

Before we analyze the worst-case performance of the algorithm, let us clarify how 
we process ball in Step (9). Let h, be the contracted node eorresponding to B,. Let 
e\ and be the edges incident on h, in PT . Let vi and V 2 be the endpoints of e\ and 
£2 in Bi. Connect the center node i to the tour by adding edges (i,vi) and (i,V 2 ) (see 
Figure 2). Extend e\ and C 2 to include the portions in the ball. 




Fig. 2. Uncontracting a ball to include in PT 



Lemma 10. The rounding algorithm outputs a solution with access cost at most 2a( 1 -f 
e) times the budget D. 

Proof. Each nontour node i is connected to a tour node k such that B, n B;^ is nonempty 
and Dj^ < Di by the choice of the tour balls in the algorithm. Then, the access cost 
of i is at most ctDi -f '2.ctDi 2oc( 1 +£)'LjeV‘^ij^ij- Since x is a solution to the 

relaxation LPR, it satisfies the budget constraint X X dijXij < D. Thus, the access cost 

i^V j^m 

is at most 2a( 1 + e)D. 



Remark 11. The argument in the above proof is also valid for a problem where an 
access cost budget is specified separately for each node instead of a single budget con- 
straint on the total access cost. 



Lemma 12. The rounding algorithm outputs a solution with tour cost at most 
max{2, times the optimal cost. 

Proof. We use the following definitions. Eor an edge set M, let c{M) = Y^eeM Let 
P be the set of nodes included in the tour T output by the algorithm. Let G,- be a ball 
around i of radius D,. Let Ec denote the edge set of the contracted graph. That is, Ec 
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excludes from E all edges with both ends in a tour ball as well as portions of the edges 
with one end point strictly inside a tour ball. 

The proof follows from the following claims. 

Claim 1: c{T) < c{PT) + 2a(l + &)p X Di. 

ieP 

Claim 2: 2(a — l)ep E < P E E deZe- 

ieP iePee{Bi-Gi) 

Claim y.c{PT)<2c{MST)<2{\ + \)^ E 

eeEc 

Proof of Claim 1: The cost of the tour T equals the cost of PT, the tour on the 
contracted nodes, plus the cost of the edges in the tour balls that connect the tour nodes 
to PT. For a tour ball Bi, suppose PT touches Bi at points k\ and li 2 . The path in 5, 
connecting k\ to the center node i and i to has cost at most 2a(l + f)pDi since Bi has 
radius a(l +z)Di. 

Proof of Claim 2: By an argument similar to the proof Lemma 2 it can be shown 
that for any i e V, 'ZjeGi^ij > Then, by constraint (3) of LPR, it follows that 
Eee5(G,) > (I^e) ^ excluding r. Note that a fractional z value 

of at least must go a distance of at least (a — 1)A to get out of the ball Bi. 
We can consider this distance as a moat around G, of width (a — I)/);. So, we get 
PEee(Bi-G,) deZe > Ppfgy(«- 1)A = 2pe(a- 1)A' for any i e P, since D; = (1 + e)A'. 

Proof of Claim 3: The first inequality easily follows since we obtain PT by shortcut- 
ting MST. To show the second inequality, we show that z = ^ feasible solution 

to an LP relaxation of a Steiner tree problem on the contracted graph Gc = {Vc,Ec), 
with terminal nodes being the contracted balls and r. 

Consider S containing Bi but not r. By constraint (3) of LPR, j Eee5(s) Ze > 

1. By the definition of Bi, we also have < T+i- 25^e65(s) Ze > 

1 — So, Eee5(s)^e E 1- Thus, z is a feasible solution to the LP relaxation 

of the Steiner tree problem on Gc- Let c{ST) be the cost of the LP relaxation of the 
Steiner tree problem on Gc with terminal set the contracted nodes plus r and edge 
costs pde. Then, c{MST) < 2c{ST) (see, e.g. [AKR 95]). Since z is a feasible solution, 
c(5T) < Eee£c pdeZe = Eees^ pdeZe- Thus, the claim follows. 

From Claims 1 , 2 and 3 we get, 

1 ry 1 

C(r) < 2(1 + -)(p ^ deZe) + , (1 + -)pX X 

^ eeEc “ ^ iePee(Bi-Gi) 

Since Ec excludes edges in Bi for any i G P, C{T) < max{2, + ^) P E dgZe < 

et^Gj 

max{2, + \)OPT, where OPT is the optimal cost to (A, r,p) problem. 

From Lemmas 10 and 12, the next result follows immediately. 

Theorem 13. For any e > 0, a > 1 and any p, the rounding algorithm outputs a 
(2a(l +e),max{2, 5;;^} (1 + f )) approximate solution for the bicriteria problem 
(A, T, p) in polynomial time. 
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For minimizing the sum of the two objectives, the performance ratio of the algorithm 
is the maximum of the two ratios for the separate objectives. The best ratio is obtained 
by setting £ = 1/ V5 and a = 1 + 1 / V5, yielding a performance ratio of 3 + 2 V5- 

Corollary 14. The rounding algorithm is a (3 + 2\/2) -approximation algorithm for 
(A + r,p) problem. 
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Abstract. We consider the problem of distributed deterministic broadcasting in 
radio networks. Nodes send messages in synchronous time-slots. Each node v 
has a given transmission range. All nodes located within this range can receive 
messages from v. However, a node situated in the range of two or more nodes 
that send messages simultaneously, cannot receive these messages and hears only 
noise. Each node knows only its own position and range, as well as the maximum of 
all ranges. Broadcasting is adaptive: Nodes can decide on the action to take on the 
basis of previously received messages, silence or noise. We prove a lower bound 
on broadcasting time in this model and construct a broadcasting protocol whose 
performance matches this bound for the simplest case when nodes are situated on 
a line and the network has constant depth. We also show that if nodes do not even 
know their own range, every broadcasting protocol must be hopelessly slow. 
While distributed randomized broadcasting algorithms, and, on the other hand, 
deterministic off-line broadcasting algorithms assuming full knowledge of the 
radio network, have been extensively studied in the literature, ours are the first 
results concerning broadcasting algorithms that are distributed and deterministic 
at the same time. We show that in this case the amount of knowledge available to 
nodes influences the elficiency of broadcasting in a significant way. 



1 Introduction 



Radio communication networks have recently received growing attention. This is due to 
the expanding applications of radio communication, such as cellular phones and wireless 
local area networks. The relatively low eost of infrastrucure and the flexibility of radio 
networks make them an attractive alternative to other types of communication media. 



J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 41-52, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




42 



K. Diks et al. 



A radio network is a collection of transmitter-receiver devices (referred to as nodes). 
Nodes send messages in synchronous time-slots. Each node v has a given transmission 
range. All nodes located within this range can receive messages from v. However, a node 
situated in the range of two or more nodes that send messages simultaneously, cannot 
receive these messages and hears only noise. 

One of the fundamental tasks in network communication is broadcasting. One node 
of the network, called the source, has a piece of information which has to be transmitted to 
all other nodes. Remote nodes get the source message via intermediate nodes, in several 
hops. One of the most important performance parameters of a broadcasting scheme is 
the total time it uses to inform all nodes of the network. 



1.1 Previous Work 

In most of the research on broadcasting in radio networks [1,4, 5, 7] the network is mod- 
eled as an undirected graph in which nodes are adjacent if they are in the range of each 
other. A lot of effort has been devoted to finding good upper and lower bounds on the 
broadcast time in radio networks represented as arbitrary graphs, under the assumption 
that nodes have full knowledge of the network. In [ 1 ] the authors proved the existence of a 
family of n-node networks of radius 2, for which any broadcast requires time J? (log^ n) , 
while in [5] it was proved that broadcasting can be done in time 0{D + log® n) for any 
n-node network of diameter D. In [11] the authors restricted attention to communica- 
tion graphs that can arise from actual geometric locations of nodes in the plane. They 
proved that scheduling optimal broadcasting is NP-hard even when restricted to such 
graphs and gave an 0{n log n) algorithm to schedule an optimal broadcast when nodes 
are situated on a line. In [6] the authors discussed fault-tolerant broadcasting in radio 
networks arising from geometric locations of nodes on the line and in the plane. On the 
other hand, in [2] a randomized protocol was given for arbitrary radio networks where 
nodes have no topological knowledge of the network, not even about neighbors. This 
randomized protocol runs in expected time 0{D log n + log^ n) . 

1.2 Our Results 

The novelty of our approach consists in considering broadcasting protocols that are 
distributed and deterministic at the same time. We assume that nodes have only local 
knowledge concerning their own position and range and additionally they know the 
maximum R of all ranges. This is a realistic assumption, as the transmitter-receiver 
devices can have varying power but usually belong to a set of a priori known standard 
types. Our aim is to show to what extent this restriction of knowledge concerning the 
network affects efficiency of broadcasting. We consider the simplest scenario when 
nodes are situated at integer points on the line. We prove the lower bound l^( iogfogfi ) 
on broadcasting time of any deterministic protocol. (This lower bound is of course 
also valid for nodes situated in the plane.) Moreover, we show a broadcasting protocol 
running in time 0{D , where D is the depth of the communication graph, i.e., 

the maximum length of a shortest path from the source to any node. Thus our protocol is 
asymptotically optimal for constant D. (In Ihe full version of the paper we will also show 
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another protocol running in time 0{D + log^ R), and thus optimal for D = l7(log^ R).) 
We also consider the extreme scenario when nodes do not even know their own range. 
Under this assumption we show that every broadcasting protocol must use time 
for some networks of depth 2. 

Our results lead to the problem of finding a protocol which is asymptotically optimal 
for any network on the line. It would be even more challenging to find good protocols 
for arbitrary networks in the plane. The lower bound I7 ( ) seems weak in case 
of the plane and we would not expect protocols as efficient as the one presented in this 
paper. 



2 Preliminaries and Model Description 



Nodes are situated at integer points of the line and are identified with the respective 
integers. Every node v has a non-negative integer range r(v). The set of pairs (v, r(v)), 
for all nodes v, with a distinguished node s called the source, is referred to as a con- 
figuration. If V sends a message, the signal reaches exactly those nodes that are in the 
segment [u — r(u), u -f- r(u)]. These nodes are said to be in the range of v. However, 
a node situated in the range of two or more nodes that send messages simultaneously, 
cannot receive these messages and hears only noise. In particular, a node u which sends a 
message in the same time as one or more nodes in whose range u is situated, hears noise. 
It should be stressed that noise is assumed to be different from silence, i.e., collision 
detection is available (cf. [2]). 

Actions of nodes are performed in synchronous time-slots. We consider two models. 
In the main model (considered in sections 4 and 5) the a priori knowledge of every 
node V consists of its position v, its range r{v) and the maximum R over all ranges. It 
is important to stress that nodes do not know positions or ranges of any other nodes. In 
the second model (considered in section 3) this knowledge is further reduced: a node v 
does not even know r{v). We use the latter scenario to prove a strong lower bound on 
broadcasting time. 

In each time-slot a node v receives one of the following inputs: either silence, (when 
neither itself nor any other node in whose range v is situated transmits), or a message 
(if a unique node in whose range v is situated transmits), or noise (if v is situated in 
the ranges of at least two simultaneously transmitting nodes). All nodes run a common 
broadcasting protocol. Broadcasting is adaptive'. Every node can compute its action to 
be performed in a given time slot on the basis of previously received inputs. This action 
is either sending a particular message or keeping silent. 

The reachability graph associated with a given configuration is the directed graph G 
whose vertices are nodes of the configuration and there is a directed edge from v to w, if 
w is in the range of v. We assume that there exists a directed path from the source s to any 
node of G. Let d{v) denote the length of the shortest directed path in G, from s to u. The 
depth D of the graph G (or of the underlying configuration) is defined as the maximum 
of d{v) over all nodes v of the configuration. The set of all nodes of a configuration can 
be partitioned into D + 1 layers Lo,...,Ld, where Li = {u : d{v) = i}. Clearly, D 
is a lower bound on broadcasting time of any protocol. On the other hand, if all nodes 
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know the positions and ranges of all other nodes, i.e., the entire configuration, it is easy 
to construct a distributed deterministic broadcasting protocol working in time 0{D). 



3 Lower Bound under Range Ignorance 

We begin by considering the extreme scenario when the a priori knowledge of each node 
is limited to its position and the maximum R over all ranges. (A node does not even 
know its own range.) Under this assumption we show that the worst case broadcasting 
time is fl{R) for some configurations of constant depth. 

We consider the following situation: n + 1 nodes are located at the points 0, 1, . . . , n 
on the line. We assume that n is divisible by 3. Node 0 is the source. It has range equal to 
2n/3. A: > 1 out of the nodes {1,2,..., n/3} have range equal to n. For the remaining 
nodes in {1,2,..., n/3}, node i has range equal to 2n/3 — i. The nodes {n/3 + l,...,n} 
all have range equal to 1. Thus R = n and D = 2. 

We will say a node in {1, . . . , n/3} is strong if its range is n, weak otherwise. There 
are 2"^/^ — 1 possible configurations, corresponding to the possible settings of weak and 
strong nodes. 

The protocol which has the nodes 0, 1, . . . , n/3 broadcast in succession gives an 
upper bound of n/3 + 1 on broadcasting time. The remainder of this section is devoted 
to showing that any protocol must use at least n/3 + 1 steps, i.e., the above protocol is 
optimal. 

In order to show this lower bound we must be more precise in our model of a protocol. 
We assume that the n + 1 nodes are universal Turing machines running synchronously 
using a global clock initially set to 0. The input to a node i in 1, . . . , n is a program Pj. 
Node 0 has as input Pq and a string M on a special input tape that is to be broadcast. 
At the completion of the protocol, all nodes will have entered a terminal state and will 
have output M onto a special output tape. All steps of a protocol (except the first) consist 
of three phases: Receive, Compute, Broadcast. During the Receive phase, every node v 
reads from a special reception input tape the results of the Broadcast phase of the previous 
step which is determined by the rules for packet radio networks and in whose range v is 
situated in the given configuration (see the discussion below concerning the Broadcast 
phase). During the Compute phase, the nodes perform an arbitrary computation based 
upon the input they received, their current state (including the contents of all tapes) and 
the program they are running. As a result of this computation each node decides on one 
of two actions, either to broadeast or be silent during the Broadcast phase. If a processor 
decides to broadcast then it writes its state including the contents of all of its tapes, 
the positions of its heads, etc., to a special broadcast output tape. If it decides not to 
broadcast it writes a special symbol indicating “silence” to the broadcast output tape. 
After a Broadcast phase, a node’s v reception input tape contains one of the following: 

1. a special symbol representing “silence”, if none of the nodes in whose range v is 
situated decided to broadcast; 

2. a special symbol representing “noise”, if two or more of the nodes in whose range v 
is situated decided to broadcast; 

3. the contents of the broadcast tape of a node w, if w is the unique node that decided to 
broadcast among nodes in whose range v is situated. 
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Recall that every node is in its own range. The first step has no Receive phase. If a 
node enters a terminal state it no longer participates in the protocol. It is assumed that 
the same programs Pq, ... ,Pn are used for all input configurations and for all messages 
M. We are now ready to state the main theorem of this section: 

Theorem 1. For any deterministic broadcast protocol there exists a configuration on 
which it requires n/3 + 1 steps. 

Proof. Assume to the contrary there exists a protocol V which on all configurations 
finishes in f < n/3 steps. Let \Pi \ be the number of bits required to describe the input 
program to node i for V. Let M be a string with Kolmogorov complexity [9] greater than 
n + \Pi \ bits. The intuitive reason for choosing a message of this complexity is 

that it precludes the possibility of encoding it in at most n/3 time-slots by silence-noise 
bits. The proof of the theorem is based on the following three lemmas. 

Lemma 2. For input broadcast message M,for all configurations, there exists at least 
one step of protocol V during which precisely one strong node broadcasts. 

The second lemma shows that the source 0 must take one step to broadcast alone. 

Lemma 3. For input broadcast message M,for all configurations, there exists at least 
one step of protocol V during which which node 0 broadcasts and precisely zero nodes 
among 1, . . . , n/3 broadcast. 

The third lemma shows that the actions taken by nodes 1, . . . , n/3 for the first 
t < n/3 steps of any protocol are the same for all configurations. 

Lemma 4. For input broadcast message M, at the end of step i < n/3 of protocol V, 
the state of each of the nodes 0,l,...,2n/3 — i + 1 (including the contents of their 
broadcast tape) is the same for all configurations. 

As a consequence of Lemma 4, for protocol V running in f < n/3 steps on bro- 
adcast input message M, for each of the nodes 0,1,..., n/3, its actions, i.e., whether 
it broadcasts or is silent, is the same during each of the steps of the protocol, for all 
configurations. Thus, we can consider the actions of these nodes as consisting of t sets. 
Si, S 2 , . . . , St where set Si is the subset of these nodes that broadcast during step i. We 
are now ready to complete the proof of the theorem. 

Consider the protocol P on broadcast input message M. By lemma 3 one of the t 
steps ofP must have node 0 broadcast while all nodes in 1, . . . , n/3 are silent. Consider 
the remaining t — 1 < n/3 steps, the only ones during which the nodes 1, . . . , n/3 can 
broadcast. Assume that none of the sets associated with these steps are singletons. Then 
for the configuration consisting of all strong nodes, in all steps, either all strong nodes are 
silent or two or more strong nodes broadcast. This contradicts lemma 2. Therefore there 
must be at least one singleton set. For all singleton sets, assign the weak range to the node 
in the set. Now remove all nodes assigned weak range from the sets. If after this process, 
no singletons are created, then the configuration with all remaining nodes assigned the 
strong range again contradicts Lemma 2. Assign the resulting singletons weak range, and 
continue with this process. It must stop with all sets having been reduced to singletons 




46 



K. Diks et al. 



or to the empty set. Sinee t — 1 < n/3, there exists at least one node which does not 
appear in any singleton. Consider the configuration where that node is strong and all 
others are weak. In this configuration, the given node never broadcasts and therefore 
Lemma 2 is again contradieted. Therefore, no sueh protocol V exists and at least t + 1 
steps are required to solve the problem. □ 

It easily follows from considerations in the next seetion that in case when each node 
knows its position and range, as well as the maximum R over all ranges, broadcasting 
for configurations considered above can be done in time 0(log R). (See Algorithm 1 - 
leader election in a cluster.) 



4 Broadcast Protocols with Known Range 

In this seetion we show that if every node knows its own range and position, as well as 
the maximum R over all ranges, the lower bound from section 3 can be dramatically 
invalidated. We show a broadeast protocol running in worst-case time Q( iogfogit )’ 
all configurations of depth 2. (Recall that without knowledge of nodes’ own range we 
showed such a configuration requiring time fl{R).) This protocol can be generalized to 
give time O ( D ), for all configurations of depth I? . The lower bound to be proved 

in seetion 5 shows that the above protocol is asymptotically optimal for constant D. 

For any nonempty set S of nodes, a set S' C S is right-equivalent to S, if max{u + 
r{v); V e 5"} = max{u -h r{v); v e S}. Left-equivalent subsets are defined similarly. 
We will first restrict our considerations to informing nodes larger than the source, and 
we will use the term equivalent instead of right-equivalent. 

For simplicity we assume that i? is a power of 2. Modifications in the general case 
are obvious. For every integer j and every I = 0, 1, ..., log i?, we define = 

{j2\j2^ -f 1, ..., {j + 1)2* — 1}. Fix a layer L. Assume, without loss of generality, that 
all ranges of nodes in L are strietly positive. The cluster C{j, 1) is defined as the set of 
nodes {v e L n I{j, 1) '■ 2^ < r{v) < 2*+^}, if this set is nonempty. The integer I is 
called the level of the cluster C{j, Z). Anode v e L belongs to the cluster 
where l{v) = [logr(u)J and j{v) = max{ji' : < u}. 

Lemma 5. Clusters form a partition of L. Every pair of nodes in a cluster are in each 
others range. 

The leader of a cluster is its node v with maximum value u -f r (u) . If there are many 
such nodes then the leader is the one with maximum range among them. Notice that 
the singleton set of a leader is equivalent to the eluster. A node u € W C L is called 
X -nonessential if there exists v £ X such that u is in the range of v and either (1) 
V + r{v) > u + r{u) or (2) v -t- r{v) = u + r{u) and v <u. 

The set of all nodes X-nonessential is denoted by and the set of all other nodes in 
X (called X-essential) is denoted by X+. 

Lemma 6. The set X+ is equivalent to X. 
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We first construct a broadcasting protocol working in time 0{ for confi- 

gurations of depth 2. To this end we will solve the following problem. Consider a layer 
L C {0, i? — 1}, for which we have v + r{v) > R, for any node v e L. We want to 
construct a small subset of L equivalent to L. We will show how to construct such a set of 
size O (log R) in time O ( logfp^ ) ■ This will yield the desired protocol for configurations 
of depth 2, with all nodes in this small equivalent subset broadcasting sequentially. 

Lemma 7. For every level t = 0, 1, ..., log R, the set L contains at most two (consecu- 
tive) clusters — 2,1) and — 1,1). 

It follows that the number of clusters in L is at most 2 log i?+ 2. The set of leaders of 
all clusters is equivalent to L. In fact, the small equivalent set that we seek, is the subset 
of the set of all leaders. 

Let Le (Lo) be the set of those nodes u e L for which j (u) is even (odd). 

Lemma 8 . The set L+ U L+ is equivalent to L and has size 0(log R). 

We now show how to construct L+. The construction of L+ is similar. Consider 
clusters C{j, 1) with odd j. 

Lemma 9. Let h > h + 3. If there exists a node v € C{ji, If such that v > R — 3 ■ 
2^-3 -|- 1 then all nodes in C{j 2 , h) are Lo-nonessential. 

Clusters C'(j2 , ^2 ) satisfying Lemma 9 are called useless. All other clusters are called 
useful. 

Lemma 10. Let h > h + 3. Ifv e C{j\,l\) and v < R — 3 ■ 2^^3 ^ then v is not 

in the range of any node from C{j 2 ,l 2 )- 

A sequence of clusters 7 = {C{ji,li), ..., C{jg, h)) is called a chain if it satisfies 
the following conditions: 

1 . h > ... > Is, 

2. for all i, all nodes from clusters of levels li, li +\, ..., R are in the range of all nodes in 
cluster C'(j), k) but no node from clusters of levels f, ..., is in the range of a node 
from cluster C{ji, If). 

It follows from Lemmas 9 and 10 that useful clusters in layer Lo can be partitioned 
into three chains, 70, 71, 72, where 7^ is the sequence of consecutive usetul clusters on 
levels I mod 3 = i. 

We can now formulate a high-level description of an algorithm constructing Lf. It 
consists of three phases. 

1. Find and eliminate useless clusters. 

2. In every chain 7 ^, for t = 0, 1, 2, find leaders of clusters from this chain. 

3. Eliminate nonessential nodes from the set of leaders found in phase 2. 

We give the details only of phase 2 which is the most difficult. We show how to find 
leaders of all clusters in a chain in time Define b{j, 1) = {j + 1)2*+^ — 1. 

We assign to every node v e C{j, 1) its label defined as follows: lab{v) = (u 4- r{v) — 
b{j, 1) — l)R + {b{j, 1) — v). Notice that 0 < lab{v) < Lf — R. 
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Lemma 11. Dijferent nodes in a cluster have different labels. A node v is a leader in 
C{j, 1), if and only if, lab{v) > lab{u),for all nodes u in C{j, 1), different from v. 

Every node v can compute parameters j{v) and l{v), as well as its label lab{v) 
knowing its position and range. Labels will be used to elect a leader by binary search. 
Upon completion of the algorithm the value of a boolean variable leader(u) informs the 
node V if it is a leader. 

Algorithm 1 : Election of a leader in a cluster - explicit binary search. Algorithm for 
node V. 

I := 0; r := — R\ 

while r — i > 0 do 

m := [(; + r)/2j; 

if m + 1 < lab{v) < r then broadcast /*/ else keep silent; 
if silence then r := m else I := m + 1; 
leader(u):=(ia6(u) = 1); 
if leader(u) then broadcast {v, r{v)). /**/ 

In step /*/ the node broadcasts any message (it is sufficient just to send a signal). In 
step /**/ messages may be different, depending on the purpose the leader is used for. 
In our case we want to identify nonessential leaders in clusters of the chain. Hence the 
leader broadcasts {v,r{v)). 

For all nodes v € values of I and r are the same after each turn of the 

while loop. If u is a leader, we have I < lab{v) < r. Hence a leader in a cluster can be 
elected in time ©(log R). Algorithm 1 could be used to elect leaders in each cluster of 
the chain separately. However, this would result in time ©(log^ R) . In order to speed up 
the process, we need to elect leaders in many clusters simultaneously. However, in doing 
so, we need to avoid interference among nodes from different clusters broadcasting at 
the same time. 

We start with a generalization of Algorithm 1. P = R^ — R + 1 is the number of 
possible labels of nodes. Let S > [log P] be an integer and let A be an arbitrary set of 
size P of binary sequences of length S. Denote by ao, o;i, ..., o;p_i the lexicographic 
ordering of sequences from A. Assign to every node u its binary label. binlab{u) = 

■ Clearly 6mla6(u) > 6mla5(u), if and only if, Fora sequence 

a,p'ref{a, i) denotes its prefix oflengthi, and o;[i] denotes the ith term of a. In particular, 
pref{a, 0) is the empty sequence e. 

Suppose that every node knows its binary label with respect to a given set A. The 
following algorithm elects a leader in a cluster, using binary labels. 

Algorithm 2 : Election of a leader in a cluster - implicit binary search. Algorithm 
for node v. 

f3 := e;a := binlab{v); 

for i ;= 1 to S' do 

if pref {a, i — 1) = /? and a[i] = 1 then broadcast else keep silent; 
if silence then /?:=/?• 0 else /?:=/?•!; 
leader(u):=(/3 = a); 
if leader(u) then broadcast (v, r(v)). 

It can be easily shown by induction on the number of iterations of the for loop that 
the sequence /? is the same for all v € C{j, 1), and that (3 = pref{binlab{v),i), if v 
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is a leader. The correctness of Algorithm 2 follows from this observation. Moreover, 
if S' = [log P] and A is the set of all binary sequences of length S, Algorithm 2 is a 
restatement of Algorithm 1 . 

We will now use Algorithm 2 to find leaders in all clusters of a chain j simultaneously. 
In eaeh eluster a separate “eopy” of the algorithm will perform eleetion. Notice that if 
no node broadcasts in a given step, we can extend the sequence /? by one bit: 0, in every 
cluster. Clearly, in nontrivial eases, exclusively silent steps cannot accomplish leader 
election. Nevertheless, we will keep the number of “noisy” steps small, and at the same 
time eleet leaders fast. Noisy steps are a problem because nodes from a given cluster 
can be heard in all clusters of lower levels. In order to prevent this interference from 
disturbing computations in clusters of lower levels, we add, for eaeh step of Algorithm 
2 performed in any cluster, two steps verifying if noise heard by nodes in this cluster is 
not caused by nodes from clusters of higher levels. If it is, nodes from this eluster repeat 
the same step of Algorithm 2 and we say that the cluster is delayed. If we guarantee 
that eaeh cluster is delayed only during 0{ steps and that Algorithm 2 works 



in time Q( iogfogfi ) ^ ~ *^( iog1ogfi )) leaders in all clusters will be elected in 
time 

To this end we show that the set A of sequenees can be chosen to make the number 
of “noisy” steps Q( iog°fogfi )- Then no eluster will be delayed more than Q( iogfog^fi ) 

Let S = and H = fuSpl- 



Lemma 12. There exist at least P binary sequences of length S containing at most H 
terms 1. 



Thus we ean take as A any set of P binary sequences of length S containing at 
most H terms 1 . It remains to show, how nodes that hear noise can determine that it is 
caused by broadcasting nodes from clusters of higher levels. Suppose that nodes know 
the conseeutive number of their cluster in the ehain. (This can be learned in 0(log R) 
steps.) Fix a time unit i in whieh various steps of Algorithm 2 are performed in various 
clusters. In time unit i + 1 (the first verifying step), all nodes that heard noise in time 
unit i and are in clusters with even number, broadcast. In time unit i + 2 (the seeond 
verifying step), all nodes that heard noise in time unit i and are in clusters with odd 
number, broadcast. Notice that if some cluster heard noise in time unit i, caused by 
a higher level cluster, it must hear noise in both verifying steps. Such clusters repeat 
the step of Algorithm 2 performed in time unit i. On the other hand, clusters that hear 
noise in only one verifying step, can perform the next step of Algorithm 2 because all 
nodes of this cluster know that noise heard in time unit i was caused by nodes from this 
cluster. In order to keep synehrony, nodes that hear nothing in time unit i, perform the 
corresponding step of Algorithm 2 and wait two time units. 

Hence it is possible to eleet leaders in all clusters of a chain in time 0{ and 

consequently phase 2 can be performed in time 0{ logfp^fi )- Thus we have shown how 
to construet a set of size 0(log R) equivalent to the layer L, in time 0{ This 

implies that broadcasting in configurations of depth 2 can be done in time 0{ 

This result can be easily generalized as follows. 
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Theorem 13. Broadcasting in a configuration of depth D can be done in time 



0{D 



log^ R \ 

log log R >' 



5 Lower Bound with Known Range 



This section is devoted to establishing the lower bound J?( ) on broadcasting time 

of any deterministic protocol in our main model, i.e., even when each node knows its 
own range. This lower bound will show that the protocol from section 4 is asymptotically 
optimal for constant I?. For simplicity we use a more informal style than that from section 
3. It is not difficult, however, to reformulate the argument using Turing machines, states 
of nodes, etc. As before, and for the same reasons, we need to take a broadcast message 
of sufficiently high Kolmogorov complexity. 

Theorem 14. For any deterministic broadcast protocol there exists a configuration of 
depth 2 on whieh it requires I^( iogfogfi ) 

Proof. Nodes are situated at nonnegative integers. 0 is the source and its range is R, 
whichisapowerof2. Let /i = [| log andp = 2^.Leta:o, Xh-i be a decreasing 
sequence of integers defined by: = i? + 1 — (3 • 2* — 2) • 2^. Notice that Xi > 1, for 

all i = 0, 1, ..., h — 1. 

For all j = 0, 1, ..., h — 1, we define f as the segment {xj,Xj + 1, ..., Xj Fp — 1}. 
These segments are pairwise disjoint and a segment with higher index is to the left of 
a segment with lower index. Layer 1 is a subset of the union of these intervals. Denote 
Uj = R + p — j and let r{v) = pj — v, for all v e Ij. Thus the range of every node in 
the first layer is at least p. Every pair of nodes in the same segment are in each other’s 
range. Moreover, all nodes from segment Ij are in the range of nodes from segments R, 
for k > j but not in the range of nodes from segments R, for k < j. Integers Pj form a 
descending sequence and pj is in the range of nodes from R,..., I j but not in the range 
of other nodes from layer 1 . 

The adversary will choose sets Cj C R of nodes. Whenever Cj is nonempty, the 
adversary places a node in pj and assigns r{pj) = 0. Such a node must be informed 
but cannot inform any other node. The entire configuration consists of the source, of the 
union of sets Cj , for j = 0 , 1 , . . . , h — 1 and of the above nodes pj . More precisely, layer 
1 is equal to Cq U ... U Ch-i and layer 2 consists of corresponding nodes Pj. Hence 
D = 2. 

Assume that all nodes in the first layer already know the source message. (This 
requires one step.) We will show that subsets Cj can be chosen in such a way that 
steps are needed to inform all nodes of layer 2. Notice that the source is not in 
the range of any other node, hence it cannot modify its actions according to adversary 
decisions. 

Let t = and let A be a broadcast protocol informing any configuration of 

the above type in at most t steps. A node v G Cj is called solitaire in Cj if, in some 
step of A, V is the only broadcasting node from Cj . The rest of the proof is based on the 
following lemmas. 
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Lemma 15. Every nonempty set Cj must contain a solitaire. 



Lemma 16. Fix j. For some nonempty set Cj, the number x of steps in which at least 
two nodes from Cj broadcast, according to protocol A, prior to the first step in which a 
solitaire in Cj broadcasts, is at least m = [dh/ log h\, where d is a constant independent 
ofh. 

Now suppose that during the selection process of the solitaire in Cj some nodes from 
Ck, k > j, broadcast in steps ti, tg. Is it possible to take advantage of these steps 
in order to reduce the number of remaining steps in which nodes in Cj broadcast? We 
will show that this is not the case, by constructing a set Cj with a stronger property than 
above: 

Lemma 17. The number of steps other than t\, tg in which at least two nodes from 
Cj broadcast, according to protocol A, prior to the first step in which a solitaire in Cj 
broadcasts, is at least m. 

In order to finish the proof of the theorem, we show that at least steps are 

required to inform all nodes in layer 2 . Consider the segment Ih-i- We have shown 
that there exists a subset Ch-i C Ih-i for which at least “noisy” steps are 

required before a solitaire is chosen. If the latest of these steps exceeds then 

the proof is finished. Otherwise, we consider the segment Ih-2- There exists a subset 
Ch-2 C Ih-2 for which at least [p^J additional “noisy” steps are required before a 
solitaire is chosen. If the latest of these steps exceeds [p^^ J then the proof is finished. 
Otherwise, we proceed to the construction of Ch-z, and so on. After h stages the number 
of steps required will become at least [p^Jfi = l^( iogfogfi )- ^ 
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Abstract. Our model in this paper is the standard, two-dimensional n x n mesh. 
The first result is a randomized algorithm for h-h routing which runs in OQin) 
steps with high probability using queues of constant size. The previous bound 
is 0.5/m + oQin) but needs the queue-size of QQi). An important merit of this 
algorithm is to give us improved bounds by applying several schemes of faulty- 
mesh routing. For example, the scheme by [Rag95], originally 0(n log n) time 
and 0(log^ n) queue-size, gives us an improved routing algorithm on p- faulty 
meshes (p < 0.4) whieh runs in 0(n ^°\ " ) time using 0(k) queue-size for any 
k < log n. Thus, when k = log n it improves the queue-size by the factor of log n 
without changing the time bound and when k is eonstant, it needs only eonstant 
queue-size although the running time slows down by the factor of log n. 



1 Introduction 

One of the most studied parallel models with a fixed interconnection network is a two- 
dimensional mesh-connected processor array (a 2-D mesh for short). In this model, 
n X n processors are placed at intersections of horizontal and vertical grids, where each 
processor is connected to its four neighbors via point-to-point communication links and 
can communicate with them in a single step (more generally, each processor can transfer 
one message per link in a single time-unit). 

Packet routing is clearly a fundamental problem in the area of parallel and/or dis- 
tributed computing and a great deal of effort has been devoted to the design of efficient 
algorithms on meshes. A special case of the routing problem is permutation routing, in 
which every processor is a source and destination of precisely one packet. Among several 
variations of the routing problem, permutation routing has been the most popular one 
since it has been considered to be a standard benchmark to evaluate the overall efficiency 
of communication schemes. The efficiency of a routing algorithm is generally measured 
by its running time. However, efficiency in the queue-size of each processor, i.e., redu- 
cing the maximum number of temporal packets that can be held by a single processor 
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at the same time, has been considered to be equally important. This is of course due to 
the practical reason, but another obvious reason is that the restriction of the queue-size 
makes the problem far more interesting from theoretical view-points. 



Consequently, many researchers have been interested in routing algorithms with 
constant queue-size: Leighton, Makedon, and Tollis [LMT95] and Sibeyn, Chlebus, 
and Kaufmann [SCK97] gave deterministic algorithms with running time 2n — 2, mat- 
ching the network diameter, and with constant queue-size. However, their algorithms 
involve a flavor of mesh-sorting algorithms and may be too complicated to implement 
on existing computers. Hence an oblivious path selection is another well-received ap- 
proach [BH85,BRSU93,KKT91]. In the oblivious path selection, the entire path of each 
packet has to be completely determined by its source and destination before routing starts. 
A typical oblivious strategy for mesh network is called a dimension-order algorithm: A 
packet first moves horizontally to its destination column and then moves vertically to 
its destination row. It is well known that in spite of very regular paths, the algorithm 
can route any permutation on the meshes in 2n — 2 steps. However, unfortunately, some 
processor requires J?(n)-size queue in the worst case. In order to reduce the queue-size, 
the randomized techniques based on the Valiant-Brebner algorithm [VB8I] are quite 
often used: The packets are first sent to random destinations and then they are routed 
to their final destinations. Valiant and Brebner gave a simple, randomized, oblivious 
routing algorithm which runs in 3n + o(n) steps with high probability. Rajasekaran and 
Tsantilas [RT92] reduced the time bound to 2n + 0(log n). However, those queue-sizes 
still grow up to J7(log n/ log log n) large. Until very recently, little had been known 
whether the queue-sizes can be decreased to some constant without increasing the time 
bound or sacrificing the obliviousness. Last year, Iwama, Kambayashi and Miyano made 
a significant progress on this problem. Following their intermediate result in [IKM98], 
Iwama and Miyano finally gave an 0(n) deferministic oblivious algorithm on the 2-D 
mesh with constant queue-size [IM99]. 



The present paper deals with more general cases of packet routing in the following 
two senses: One is the h-h routing problem on 2-D meshes; i.e., at most h packets origi- 
nate from any processor and at most h packets, whose original positions may be different, 
are destined for any processor. (The other is routing on faulty meshes mentioned later.) 
The h-h routing clearly reflects practical implementations better than permutation rou- 
ting, or 1-1 routing, since each processor usually generates many packets during a specific 
computation. In the case of h-h routing on the 2-D meshes, an easy hn/2 lower bound 
comes from a fundamental nature of the model, i.e., this bound is known as the bisection 
bound. The first nontrivial upper bound for h-h routing was proposed by Kunde and 
Tensi [KT89]. Their deterministic algorithm runs in 5/m/4 + o(/m) steps. Then, several 
progresses have been made: Kaufmann and Sibeyn [KS97] and Rajasekaran [RaJ95] sho- 
wed (originally in SPAA92) that randomized algorithms based on the Valiant-Brebner 
algorithm [VB81] can solve h-h routing in time /m/2 + o{hn), which almost matches 
the biseetion bound. Then [KSS94] and [Kun93] developed deterministie algorithms of 
the similar performance. However, all these algorithms require J7(/i)-size queues in the 
worst ease. 
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In this paper, we present a randomized algorithm for h-h routing whieh needs only 
constant queue-size. It runs in 0{hn) steps with high probability. This algorithm follows 
the same line as the algorithm of Iwama and Miyano for permutation routing [IM99], but 
our randomization approach is completely different from the Valiant-Brebner algorithm. 
Although the running time is worse than above, our algorithm has the following two 
important merits: (i) The maximum queue-size is bounded by some constant that does 
not depend on h. (ii) The algorithm is oblivious, i.e., the path of any packet is determined 
by its source and destination, solely and deterministically. 

The other generalization is to add the fault-tolerance capability to mesh-routing. In 
the p-faulty mesh, each processor may fail independently with some probability boun- 
ded above by a value p. Routing on p-faulty meshes has been also popular, for which 
many algorithms were developed [KKL+90,Mat92,Rag95]. Generally speaking, they are 
divided into two categories; the first type can only cope with static faults, i.e., faulty pro- 
cessors are fixed throughout the computation and their locations are known in advance. 
The second type of algorithms can give on-line adaptations to dynamically occurring 
faults. 

Another important merit of our h-h routing algorithm is to give us improved bounds 
by applying several schemes of faulty-mesh routing. For example, the scheme by [Rag95] 
for dynamic faults, originally 0(n log n) time and O(log^n) queue-size, gives us a 
routing algorithm on p-faulty meshes (p < 0.4) which runs in ”' ) time using 

0{k) queue-size for any k < log n (Raghavan first assumed that p is bounded above by 
0.29 and then Mathies [Mat92] improved the probability bound to p < 0.4). Thus, when 
k = log n, it improves the queue-size by the factor of log n and when k is constant, it 
needs only constant queue-size although the running time slows down by the factor of 
logn. Another example is the scheme by [KKL+90] for static faults. In this case, no 
significant improvement is possible since their algorithm already runs in linear time and 
uses queues of constant size. However, it is possible to make the algorithm oblivious 
and much simpler; the original one is based on the sorting-like routing. 

In what follows, we first describe our models and problems more formally in the next 
section. Our h — h routing is presented in Section 3 including basic ideas, algorithms 
and analysis of their time complexities. In Section 4, we give how we can apply this 
h — h routing in order to obtain improved routing algorithms on faulty meshes. 



2 Our Models and Problems 

Two-dimensional meshes are illustrated in Figure 1. A position is denoted by 
1 < < n and a processor whose position is (i,j) is denoted by Each processor 

is connected to four neighbors and a connection between a neighboring processor is 
called a {communication) link. The processors operate in a synchronous fashion. In 
a single time-unit, say t, each processor can perform an arbitrary amount of internal 
computation, and communicate with all its neighbors. As usual, we assume that t = 1 
and then the running time of algorithms is determined by the number of communication 
time-units. By the queue-size, we mean the working queue-size, i.e., the maximum 
number of packets the processor can temporally hold at the same time in the course of 
routing. 
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The problem of h-h routing on the meshes is defined as follows: Each processor 
initially holds at most (unordered) h packets. Each packet, (s, d), consists of two portions; 
s is a source address that shows the initial position of the packet and d is a destination 
address that specifies the processor to which the packet should be moved. (A real packet 
includes more information besides its source and destination such as its body data, but 
it is not important within this paper and is omitted.) Routing requires that each packet 
must be routed independently to its destination and every processor is the destination of 
at most h packets. The h-h routing is called permutation routing if d = 1. 

If we fix an algorithm and an instance, then the path R of each packet is determined, 
which is a sequence of processors, Pi (= source), P 2 , ■ ■ ■ ,Pj{= destination). A routing 
algorithm. A, is said to be oblivious if the path of each packet is completely determined 
by its source and destination. 

As for the faulty-mesh routing, a mesh is called p-faulty mesh if each processor fails 
with a certain fixed probability p, independently of each other. In this paper, we assume 
that only processors fail, and that these failures are dynamic and their location are not 
known before routing starts. If a processor fails, it can neither compute nor communicate 
with its neighbors, i.e., the so-called crash type failure only happens. See [GHKS98] for 
more details on the faulty meshes. 



3 h-h Randomized Routing 

3.1 Basic Ideas 

Kaufmann and Sibeyn [KS97], and Rajasekaran [Raj95] gave 0{hn) randomized al- 
gorithms for the h-h routing problem. Their randomization techniques are based on 
[VB81], i.e., all the packets are first distributed temporally to random destinations and 
then routed from there to their final destinations. This random selection of intermediate 
positions allows us to avoid serious path-congestion by distributing packets evenly, ho- 
wever, increases the maximum queue-sizes: Those algorithms require J?(h) queue-size 
in the worst case. 

Our algorithm is also randomized but its idea is different from the above. To see 
this, let us take a look at the following simple observation: As mentioned before, for 
permutation routing, there are several 0(n) algorithms with constant queue-size. Then 
if every processor picks one packet up among h initial ones and can route the packet 
(or packets in total) to its final destination within 0{n) steps, then, by repeating 
this process simply h times, we can achieve an h X 0{n) = 0{hn) step algorithm for 
h-h routing. Unfortunately, this observation is actually too optimistic: In each round, 

packets in total are routed on the whole plane. If hn (h < n) of those v? packets 
are destined for some single row/column and, to make matters worse, if 0{hn) packets 
should pass through a single communication link, then each round requires at least 
0{hn) steps and hence the whole algorithm takes 0{h?n) steps. So our basic strategy 
is quite simple: Each processor chooses one packet at random, independently of each 
other, and then moves it to the final destination. On average , the randomized selection 
chooses approximately n packets per row/column and it turns out that the above kind of 
bad path-congestion will be avoided with high probability. 
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Our routing method of each round is very similar to the algorithm of Iwama and 
Miyano shown in [IM99]; (i) As mentioned above, each processor chooses one packet 
at random and moves the packet in each round, i.e., packets move in total, (ii) Before 
routing those packets toward their destination, we change the order of packets in their 
flow to control the injecting ratio of packets into the critical positions where heavy path- 
congestion can occur, by using the idea based on the bit-reversal permutation ([Lei92], 
see below), (iii) Every packet is routed to its final destination. 

Deflnition 1. Let i\i 2 ■ ■ - it denote the binary representation of an integer i. Then 

denotes the integer whose binary representation is ■ ■ - i\. The bit-reversal 

permutation (BRP) tt is a permutation from [0 , 2^ — 1] onto [0 , 2^ — 1] such that tt (i ) = 

Let X = xqXi ■ ■ -X 2 t-i be a sequence of packets. Then BRP{x) is defined to be 
BRP{x) = a;^(o)a;^(i) • • • a:^(2^-i)- 

When f = 3, i.e., when x = xqX\ ■ ■ ■ xj, BRP{x) = xoX 4 X 2 XqXiX 5 XsXt. Namely, 
Xj is placed at the 7r(ji)th position in BRP(x) (the leftmost position is the 0th position). 
The following lemma shown in [IM99] is important and is often used in the rest of the 
paper: 

Lemma 1 [IM99]. Let x = xqX\ ■ ■ -Xn-i be a sequence where n = 2^ for some 
integer £, and 2: = XiXi-f-i ■ ■ ■ Xi-f-k-i be its any subsequence of length k. Let Xj^, Xj^ 
and Xj^ be any three symbols in 2 that appear in BRP{x) in this order. Then the distance 
between Xj-^ and Xj^ is at least • 

3.2 Algorithms 

Throughout this section, we assume that the side-length n of 2-D meshes is divided by 
six and the entire plane is divided into 36 subplanes, SPi^i through SPq^q, as shown in 
Figure 2-(a). However, the following argument can be easily extended to the case that 
the length is not divided by six. For simplicity, the total number of processors in 2-D 
meshes is hereafter denoted not by but 36n^, i.e., each subplane consists of n x n 
processors. 

It is important to define the following two notations on sequences of packets on linear 
arrays, which will play key roles in our algorithms: 

Definition 2. For a sequence a; of n packets, SORT{x) = Xs^Xg^ ■ ■ denotesa 

sorted sequence according to the destination column. Namely, SORT (x) is the sequence 
such that the destination column of Xg. is farther than or the same as the destination 
column of Xg. ifi> j. 

Definition 3. For technical reason partly due to Lemma 1 , it it desirable if the length 
of a packet sequence is a power of two. Suppose that, for example, a sequence x includes 
only ten packets xqX\ • • • xg. In this case we change the length of x into 16 (= 2^) by pad- 
ding X out with six spaces or “null” packets, (j)Q through (^5 . If a null packet exists at a pro- 
cessor P^, then Pi’s queue is just empty. Namely, PAD 4 {x) is defined to be PAD 4 {x) = 
4 'o 4 'i 4 ' 24 ' 34 > 44 > 5 XoXiX 2 X 3 X 4 X 5 XeX 7 XsXg (“4” of PAD4 is the 4 of = 2^). Generaliza- 
tion is easy. Hence, BRP{PAD 4 {x)) = (j)QX 2 (j) 4 XQ(j) 2 X 4 XQX^(j)ix^(j)^xj(j)^x^xiXg. Na- 
mely, Xj is placed at the 7 t(( 2^ — 10) + jjth position in BRP{PAD 4 (x)) (or in general 
7 t(( 2^ — n) + j)th position in BRP{PADi{x))). 
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Now we present our h-h routing algorithm on the mesh. Before routing starts, each 
processor calculates integer value £ such that 2^“^ + 1 < n < 2^. Then it repeats the 
similar process h rounds associated with the number of initial packets. Here is a detailed 
description of each round i\ 

(1) Randomized Choice: Each processor chooses exactly one packet among the 
remaining initial packets at random, independently of each other, and moves it in this 
round. 

(2) Routing [IM99] : The routing process consists of the following 36 x 36 sequential 
phases. In the first phase only packets whose sources and destinations are both in 5Pi i 
move (i.e., they may be only a small portion of the packets in SP\p). In the second 
phase only packets from SP\p to SP\p move, and so on. Here we only give an outline of 
each phase: Suppose that it is now the phase where packets from SP^^^ to 5^2,4, called 
active packets, move. The paths of those packets are shown by arrows in Figure 2-(a). 
The entire phase is further divided into the following four stages: 

Stage 1: Those active packets first move horizontally to SP^p and then vertically 
to SP^p, i.e., they temporally move into the subplane which is located three subplanes 
away from the destination subplane both horizontally and vertically, without changing 
their relative positions within those subplanes. See Figure 2-(b): For example, a packet a 
initially placed on the upper-left comer in the source subplane always moves through the 
upper-left comer position. All the active packets can arrive at their temporal destinations 
in SP^p exactly at the 3nth step from the beginning of this phase. 

Stage 2: They next go through three consecutive subplanes, from SP^p to SP^^s, 
called the permutation zone, where the packets change their order. Namely, if n proces- 
sors on some row in SP^^i originally held a sequence xofn packets, then the rightmost 
2^ processors on the row in SP^p and 5^5,3 eventually hold the sequence BRP {PAD^ 
{SORT{x))) by using the method introduced in [IM99]. (The integer value £ has been 
computed in the precomputation stage.) The reason for the rather long paths of packets 
from SP 4^3 to SP 2,4 is to prepare this permutation zone. 

Stage 3. The packets then move to 5Ps^4, called the critical zone, where each packet 
enters its correct column position. Here we apply the spacing operation, i.e., cq — 1 
steps are inserted between the first actions of any neighboring two processors (cq is 
some constant and will be fixed later). In the first step, only the rightmost processor 
in the permutation zone starts forwarding its active packet to the right, and then the 
packet keeps moving one position to the right at each step. However, the other active 
packets do not move at all during the first cq steps. In the (cq + l)th step, the second 
rightmost processor starts forwarding its packet to the right, in the (2co + l)th step the 
third rightmost processor sends its packet, and so on. Once each packet starts, it keeps 
being shifted one position to the right if it still needs to move rightward, and then changes 
the direction from rightward to upward at the crossing of its correct destination column 
where turning packets are always given a higher priority. 

Stage 4: At each step each processor moves upward its active packet if it still needs 
to move. 
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3.3 Time Complexities 

Since each processor moves exactly one packet in each round, Stages 1 and 2 are linear 
in n. Then we shall investigate the time complexities of Stages 3 and 4 in the following. 

Roughly speaking, if destinations of v? packets which move in each round are evenly 
distributed and hence the average number of packets which head for each single column 
is bounded above by an for some constant c, then the expected time complexities of 
Stages 3 and 4 could become 0{n). Now we shall count the number of those packets and 
evaluate the probability of bad behaviors using the Chemofifbound (e.g., see [MR95]). 

Lemma 2. Let Xi,X2, • • • , X^2 be independent Poisson trials such that Pr[Xj = 
1] = Pi, ~Pv[Xi = Q] = 1 — Pi, Q < Pi < 1. Let X = Xfl^Xi and p = S’^l^pi. Then, 
for 0 < b < 2e — 1, 

Pr[X > (1 + S)p] < exp(— ^). 



The probabilistic behavior of the randomized choice can be modeled as follows: 

Suppose that it is now the phase of some round where packets move from SPs to 

SPd and that SPg has processors, Pi through P„ 2 . Also suppose that processor Pi 

initially held rrii packets (out of h ones) whose destinations all are a single column, 

say COL, of the subplane SP^- When Pi chooses one packet at random, let random 

variable Xi = 1 if the destination column of the packet is COL and Xi = 0 otherwise. 

Then Pr[X^ = 1] = ^, and p = < n since the number of packets whose 

destination column is COL is at most hn, or S’^l-^nii < hn. Apply the Chemoff bound 

with S = for some constant ci > 0. Then, for some constant C 2 > 0, 

V 



Pr[X > n + Cl Vn In n > p + c\\/ pin n] < exp(— C 2 Inn) 



Namely, the number of packets which are heading for a single column is at most n + 
Cl V^nTnn with probability 1 — . 

Now consider the permutation zone, SP\, SP 2 and SP 3 , and the critical zone, SP 4 . 
After Stage 2, the permuted sequence BRP{PAD i{SORT{x))) of 2^ packets on every 
row is now placed on the rightmost 2^ processors in SP 2 and SP^. Suppose that the 
uppermost row of the permutation zone includes ki cJi’s, the second uppermost row 
includes k 2 0 . 2 ’s and so on. Here ai ’s are packets whose column destinations are the 
same, say, the jth column in the critical zone. In general, the ith row includes ki a^’s. 

Consider a processor Pi j at the cross-point of the ith row and jth column. It is 
important to note that, from Lemma I, Pi j receives at most two cJi’s during some 

particular window Ai of ^ (< cq X ^ ) steps since cq — 1 spaces are inserted 
between any neighboring two packets in the third stage of the algorithm. If neither of 
those two packets ai ’s can move up on at some step, then there must be some packet 
which “blocks” these cti’s that should be the packet which is now ready to enter the jth 
column by making a turn at some upper position than Pi j. (Recall that turning packets 
have priority at crosspoints.) Let us call such a packet blocking packet against ai ’s. Note 
that the blocking packet against ai ’s never block them again. 
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We shall count the total number of cim’s (1 < m < i — 1) which j through Pi-ij 
can receive during the window Ai . Since Pm,j can receive at most two ’s in cq [ ] 

steps, the number of cJm’s which receives during Ai of ^ steps is at most 



/ CqU , 



2k„ 



l\"] AiXiyh 



2kr, 



ki 



Hence, the total number of a’s which Pi j through Pi-ij can receive is at most 

2{ki + /c 2 + • • • + ki-i) 2{n + c\\/n Inn) 



ki 



< 



ki 



- 2 



( 1 ) 



since k\+ k 2 + ■ ■ ■ + ki-\ < n + cw/n\nn — ki with probability 1 — n . Now we 
fix the value of cq such that 

, /inn, 

Co > 4(1 + ci\ ). 

V n 

Hence the right side of the equation (1) is at most ^ — 2, i.e., there must be at least 
two time-slots such that no packets fiow on the jth column during the window Ai. This 
means that there are no blocking packets at those time-slots, and therefore the two ai ’s 
currently held in Pij can move up during the window Ai. The same argument can apply 
for any j (1 < J < n) and for any window Ai{l <i < n). As a result, if the queue-size 
of each processor is at least two, then any delay does not happen in the critical zone and 
hence Stages 3 and 4 can be also performed in linear time: 

Theorem 1. There is an oblivious, randomized, h-h routing algorithm on 2-D meshes 
of queue-size two which runs in 0{hn) steps with high probability. 



4 Application to Fault-Tolerant Routing 

In this section we consider the problem of permutation routing on p-faulty meshes 
under the dynamic fault assumption, i.e., these failures are dynamic and their location 
are not known before routing starts. Under the same setting, Raghavan [Rag95] gave a 
randomized algorithm for solving permutation routing on p-faulty meshes (p < 0.29) 
which runs in 0(n log n) steps with high probability, using queues of 0(log^ n) size. 
This algorithm is based on the Valiant-Brebner randomized algorithm [VB81], in which 
each packet is first sent to the intermediate random destination and then it is routed to the 
final position along the dimension-order path. However, instead of sending each packet 
along its master path P as defined by Valiant-Brebner scheme, Raghavan’s scheme 
broadcasts copies within a routing region P(P) defined to contain all processors within 
distance clog n of P for some constant c: Namely, if there is a path between the source 
and destination of a packet through live processors, then its routing region contains the 
path and hence every packet is routed successfully to its destination with high probability 
as shown in [Rag95]. We can prove that by applying this broadcast scheme to our h — h 
routing algorithm introduced in the previous section, we can achieve a randomized 
algorithm for permutation on p-faulty meshes whose running time slows at most by the 
factor of log n compared to the fault- free case. 
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Lemma 3. Suppose that 2 -D meshes includes x processors, and each 

processor requires c' log n steps to communicate (to send/receive a single packet) with its 
neighboring processors for some constants Cq and c^ Then, if the randomized algorithm 
of the previous section can solve the log^ n-c^ log^ n routing problem on such a 
mesh within T steps using queues of constant size, then there must exist a randomized 
algorithm for the permutation routing problem on the p-faulty mesh including n X n 
processors which runs within 0 {T log n) steps using queues of constant size. 

Proof. Only an outline is given. The entire plane of the p-faulty mesh is divided into 

2 

^2 ^pg2 „ submeshes, i.e., each submesh has c log n x c log n processors. Recall that in the 
algorithm of the previous section, each processor chooses one packet at random among 
log^ n initial packets, and then route the packet to its destination. Then, we regard one 
submesh including c log n X c log n processors on the p-faulty mesh as one processor 
ciogn ^ ciogn feult-free mesh (referred to as the smaller mesh hereafter) and 
simulate both the random-choice and the routing process of a single processor on the 
smaller mesh by the whole submesh of c log n x c log n processors on the faulty mesh. 

( 1 ) We pick up one packet from ciogn x ciogn packets on the submesh as fol- 

lows: Suppose that some submesh consists of P\ through Pc^iog^n- However, several 
processors may fail, (i) First of all, we count how many processors are alive. P\ (if alive) 
first broadcasts copies of its initial packet to those (? log^ n processors by the method 
of Raghavan using the first (? log^ n steps. After those steps, every processor surely 
receives the packet. Next P2 broadcasts its packet also to all the (? log^ n processors 
during the second (? log^ n steps and so on. However, if some processor fails, then all 
the processors do nothing at all during the (? log^ n steps. Thus every processor knows 
the number of live processors in the submesh, (ii) Suppose that now r processors are 
still alive. Then every processor chooses an integer from - at random and 

keeps the value, (hi) Again all the processors broadcast their initial packets using the 
same as (i), where if some processor has chosen a value i in (ii), then the processor stores 
only the ith packet but does not store other packets, i.e., remove them from its queue (to 
reduce the queue-size), (iv) Pi broadcasts copies of the packet which is stored in (iii) to 
all the processors of its submesh. However, if Pi is dead then nothing happens. Namely, 
if no packet is received during the log^ n steps, then P2 takes over and broadcasts 
copies of the packet which P2 chooses at random in (ii) and (iii). If P2 is dead also, 
then P3 takes over and so on. Hence only one packet is chosen at random from (live) 
packets within obviously polylog steps. Note that this packet, a, is now shared by all the 
processors in the submesh and there must be a processor which originally held a. That 
processor knows its packet was selected (and will be routed in the first round). In the 
second (and following) round, this processor behaves as it is dead. This completes the 
random selection process of the first round. Note that this process is completely repeated 
in each round. 

( 2 ) Now we go to the routing process of the first round. Each packet moves along 
the same path as the previous randomized algorithm, but this time the path consists of 
a sequence of submeshes. When a packet moves from one submesh to the next, it is 
broadcasted to all the c log n x c log n processors of each submesh. This can be done 
exactly the same as [Rag 95 ] and needs 0(log n) time. 
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(3) Repeat (1) and (2) log^ n rounds. For each round, we need polylog steps for 
(1) and (T j(? log^ n) • c! log n steps for (2). (Since (? log^ n-c? log^ n routing needs T 
steps, its single round needs (T/c^ log^ n) steps.) Therefore the total computation time 
is 0(T log n). □ 

Theorem 2. There is a randomized permutation routing algorithm on p-faulty meshes 
of queue-size two whieh runs in 0(n log^ n) steps with high probability. 

Proof. By Theorem 1, there is a randomized, c log^ n-c log^ n routing algorithm 
on 2-D meshes, including X processors and with queue-size two which 

runs in 0(n log n) steps with high probability. Now use Lemma 3 by substituting T = 
0(n log n). □ 

Theorem 3. There is a randomized permutation routing algorithm on p-faulty meshes 
of queue-size k which runs in 0{n ^°\ ” ) steps with high probability. 

Proof. Almost all processes are the same as above. However, k packets are chosen at 
random from c log n X c log n packets on every submesh by using the same ideaas(l)-(i) 
in the proof of Lemma 3, and those k packets are moved in each round. It can be shown 
that each round can be performed again within 0(n log n) steps since each queue-size 
is now k. Since there are clog n/k rounds, the total running time is ” ). Details 

are omitted. □ 



5 Concluding Remarks 

Apparently there are several problems for future research. Among others the question of 
whether the time bound of Theorem 2 can be improved will be interesting. Note that our 
algorithm works only if we can broadcast each packet to all processors in a submesh. 
We set the size of the submesh some log n x log n, but it then satisfies extra properties 
other than what we need. If we can reduce this size, then it would immediately improves 
the time bound. 
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Figure 2: 36 subplanes 
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Abstract. The IP address lookup problem is one of the major bottlenecks in high 
performance routers. Previous solutions to this problem first describe it in the ge- 
neral terms of longest prefix matching and, then, are experimented on real routing 
tables T. In this paper, we follow the opposite direction. We start out from the 
experimental analysis of real data and, based upon our findings, we provide a new 
and simple solution to the IP address lookup problem. More precisely, our solu- 
tion for m-bit IP addresses is a reasonable trade-off between performing a binary 
search on T with 0(log |T|) accesses, where |T| is the number of entries in T, 
and executing a single access on a table of 2”* entries obtained by fully expan- 
ding T. While the previous results start out from space-efficient data structures 
and aim at lowering the 0(log |T|) access cost, we start out from the expanded 
table with 2”* entries and aim at compressing it without an excessive increase in 
the number of accesses. Our algorithm takes exactly three memory accesses and 
occupies 0(2™^^ + |T|^) space in the worst case. Experiments on real routing 
tables for m = 32 show that the space boimd is overly pessimistic. Our solution 
occupies approximately one megabyte for the MaeEast routing table (which has 
|T| « 44, 000 and requires approximately 250 KB) and, thus, takes three cache 
accesses on any processor with 1 MB of L2 cache. According to the measurement 
obtained by the VTune tool on a Pentium II processor, each lookup requires 3 ad- 
ditional clock cycles besides the ones needed for the memory accesses. Assuming 
a clock cycle of 3. 33 nanoseconds andanL2 cache latency of 15 nanoseconds, se- 
arch of MaeEast can be estimated in 55 nanoseconds or, equivalently, our method 
performs 1 8 millions of lookups per second. 



1 Introduction 

Computer networks are expeeted to exhibit very high performance in delivering data 
because of the explosive growth of Internet nodes (from 100,000 computers in 1989 to 
over 30 millions as of today’s). The network bandwidth, which measures the number of 
bits that can be transmitted in a certain period of time, is thus continuously improved by 
adding new links and/or by improving the performance of the existing ones. Routers are 
at the heart of the networks in that they forward packets from input interfaces to output 
interfaces on the ground of the packets’ destination Internet address, which we simply 
call IP address. They choose which output interface corresponds to a given packet by 
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performing an IP address lookup at their routing table. As routers have to deal with an 
ever increasing number of links whose performance constantly improves, the address 
lookup is now becoming one of the major bottlenecks in high performance forwarding 
engines. 

The IP address lookup problem was just considered a simple table lookup problem 
at the beginning of Internet. Now, it is unconceivable to store all existing IP addresses 
explicitly because, in this case, routing tables would contain millions of entries. In 
the early 1990s people realized that the amount of routing information would grow 
enormously, and introduced a simple use of prefixes to reduce space [3]. Specifically, 
IP protocols use hierarchical addressing, so that a network contains several subnets 
which in turn contain several host computers. Suppose that all subnets of the network 
with IP address 128.96.*.* have the same routing information apart from the subnet 
whose IP address is 128.96.34.*. We can succinctly describe this situation by just two 
entries (128.96 and 128.96.34) instead of many entries for all possible IP addresses of 
the network. However, the use of prefixes introduces a new dimension in the IP address 
lookup problem: For each packet, more than one table entry can match the packet’s IP 
address. In this case, the applied rule consists of choosing the longest prefix match, where 
each prefix is a binary string that has a variable length from 8 to 32 in IPv4 [14]. 

For example, let us consider the routing table 
shown to the right, where e denotes the empty 
sequence corresponding to the default output in- 
terface, and assume that the IP address of the 
packet to be forwarded is 159.213.37.2, that is, 

10011111 11010101 00100101 00000010 in binary. 

Then, the longest prefix match is obtained with the fourth entry of the table and the 
packet is forwarded to output interface D. Instead, a packet whose IP address is 
159.213.65.15, that is, 10011111 11010101 01000001 00001111, is forwarded to output 
interface C. 

Looking for the longest matching prefix in IP routing tables represents a challenging 
algorithmic problem since lookups must be answered very quickly. In order to get a 
bandwidth of, say, 10 gigabits per second with an average packet length equal to 2,000, 
a router should forward 5 millions of packets per second. It means that each forwarding 
has to be performed in approximately 200 nanoseconds and, consequently, each lookup 
must be realized much faster. 



Prefix 


Interface 


e 


A 


10011111 


B 


10011111 11010101 


C 


10011111 11010101 00 


D 


10011111 11110 


E 



1.1 Previous Results 

Several approaches have been proposed in the last few years in order to solve the IP 
address lookup problem. Hardware solutions, though very efficient, are expensive and 
some of them may become outdated quite quickly [6,10]. In this section, we will briefly 
review some of the most recent software solutions. 

A traditional implementation of routing tables [16] use a version of Patricia tries, 
a very well-known data structure [8]. In this case, it is possible to show that, for tables 
with n random elements, the average number of examined bits is approximately [ log n] . 
Thus, for n = 40, 000, this value is 16. In real routing tables entries are not random 
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and have many initial bits in common, and so the average number of table accesses is 
higher than [logn]. When compared to millions of address lookup requests served in 
one second, these accesses are too many. Another variation of Patricia tries has been 
proposed in order to examine k bit each time [13]. However, this variation deals with 
either the exact matching problem or with the longest prefix matching problem restricted 
to prefixes whose lengths are multiples of /c. A more recent approach [11] inspired by 
the three-level data structure of [2], which uses a clever scheme to compress multibit 
trie nodes using a bitmap, is based on the compression of the routing tables by means 
of level compressed tries, which are a powerful and space efficient representation of 
binary tries. This approach seems to be very efficient from a memory size point of 
view but it requires many bit operations which, on the current technology, are time 
consuming. Another approach is based on binary search on hash tables organized by 
prefix lengths [17]. This technique is more memory consuming than the previous one 
but, according to the experimental evaluation presented by the authors, it seems to be 
very fast. A completely different way of using binary search is described in [5]. It is 
based on multi-way search on the number of possible prefixes rather than the number 
of possible prefix lengths and exploits the locality inherent in processor caches. The 
most recent (and fastest) software solution to the IP address lookup problem is the one 
based on controlled prefix expansion [15]. This approach, together with optimization 
techniques such as dynamic programming, can be used to improve the speed of most 
IP lookup algorithms. When applied to trie search, it results into a range of algorithms 
whose performance can be tuned and that, according to the authors, provide faster search 
and faster insert/delete times than earlier lookup algorithms. 

While the code of the solution based on level compressed tries is public available 
along with the data used for the experimental evaluation, we could not find an analogous 
situation for the other approaches (this is probably due to the fact that the works related 
to some of these techniques have been patented). Thus, the only available experimental 
data for these approaches are those given by the authors. 

1.2 Our Results 

Traditionally, the proposed solutions to the IP address lookup problem aim at solving it 
in an efficient way, whichever is the table to be analyzed. In other words, these solutions 
do not solve the specific IP address lookup problem but the general longest prefix match 
problem. Subsequently, their practical behavior is experimented on real routing tables. 
In this paper we go the other way around. We start out from the real data and, as a 
consequence of the experimental analysis of these data, we provide a simple method 
whose performance depends on the statistical properties of routing tables T in a way 
different from the previous approaches. Our solution can also be used to solve the longest 
prefix match problem but its performance when applied to this more general problem is 
not guaranteed to be as good as in the case of the original problem. 

Any solution for m-bit IP addresses can be seen as a trade-off between performing a 
binary search on T with 0(log |T|) accesses (where |T| is the number of prefixes in T) 
and executing a single access on a table of 2™ entries obtained by fully expanding T. The 
results described in the previous section propose space-efficient data structures and aim 
at lowering the 0(log |T|) bound on the number of accesses. Our method, instead, starts 
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out from the fully expanded table with 2™ entries and aim at compressing it without an 
excessive increase in the (constant) number of accesses. For this reason, we call it an 
expansion/ compression approach . 

We exploit the fact that the relation between the 2™ IP addresses and the very 
few output interfaces of a router is highly redundant for m = 32. By expressing this 
relation by means of strings (thus expanding the original routing table), these strings can 
be compressed (using the run-length encoding scheme) in order to provide an implicit 
representation of the expanded routing table which is memory efficient. More important, 
this representation allows us to perform an address lookup in exactly three memory 
accesses independently of the IP address. Infuitively, the first two accesses depend on 
the first and second half of the IP address, respectively, and provide an indirect access 
to a table whose elements specify the output interfaces corresponding to groups of IP 
addresses. 

In our opinion, the approach proposed in this paper is valuable for several reasons, 
(i) It should work for all routers belonging to Internet. The analysis of the data has been 
performed on five databases which are made available by the IPMA project [7] and 
contain daily snapshots of the routing tables used at some major network access points. 
The largest database, called MaeEast, is useful to model a large baekbone router while 
the smallest one, called Paix, can be considered a model for an enterprise router (see the 
first table of Sect. 3). (ii) It should be useful for a long period of time. The data have been 
collected over a period longer than six months and no significant change in the statistical 
properties we analyze has ever been encountered (see tables of Sect. 3). (hi) Its dominant 
cost is really given by the number of memory accesses . As already stated, the method 
requires always three memory accesses per lookup. It does not require anything else but 
addressing three tables (see Theorem 6). For example, on a Pentium II processor, each 
lookup requires only 3 addilional clock cycles besides the ones needed for the memory 
accesses, (iv) It can be easily implemented in hardware. Our IP address lookup algorithm 
is very simple and can be implemented by few simple low-level instruetions. 

The counterpart of all the above advantages is that the theoretical upper bound on 
the worst-ease memory size is 0(2™/^ + |Tp) which may become infeasible. However, 
we experimentally show that in this case theory is quite far from praetiee and we believe 
that the eharacteristics of the routing tables will vary much slower than the rate at whieh 
cache memory size will inerease. 

In order to compare our method against the ones deseribed in the previous section, 
we refer to data appeared in [15]. In particular, the first five rows of Table I are taken 
from [15, Table 2]. The last two rows reports the experimental data of our method that 
turns to be the fastest. 

As for the methods proposed in [15] based on prefix controlled expansion, we per- 
formed the measurements by using the VTune tool [4] to compute the dynamic clock 
cycle eounts. According to these measurements, on a Pentium II proeessor each lookup 
requires 3 • elk + 3 ■ Mn nanoseeonds where elk denotes the clock cycle time and Md 
is the memory access delay. If our data structure is small enough to fit into the L2 cache, 
then M£) = 15 nanoseeonds, otherwise = 75 nanoseeonds. The comparison bet- 
ween our method and the best method of [15] is summarized in Table 2 where the first 
row is taken from [15, Tables 9 and 10]. 
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Method 


Average ns 


Worst-case ns 


Memory in KB 


Patricia trie [16] 


1500 


2500 


3262 


6-way search [5] 


490 


490 


950 


Search on levels [17] 


250 


650 


1600 


Lulea [2] 


349 


409 


160 


LC trie [11] 


1000 


— 


700 


Ours (experimented) 


172 


— 


960 


Ours (estimated) 


— 


235 


960 



Table 1. Lookup times for various methods on a 300 MHz Pentium II with 512 KB of L2 eaehe 
(values refer to the MaeEast prefix database as on Sept. 12, 1997) 



Method 


512 KB L2 Cache 


1 MB L2 Cache 


Controlled prefix expansion [15] 


196 


181 


Ours 


235 


55 



Table 2. MaeEast lookup estimated times depending on eaehe size (elk = 3.33 nsee) 



Due to lack of space, we focus on the lookup problem and we do not discuss the 
insert/delete operations. We only mention here that their realization may assume that the 
cache is divided into two banks. At any time, one bank is used for the lookup and the 
other is being updated by the network processor via a personal computer interface bus 
[12]. A preliminary version of our data structure construction on a 233 MHz Pentium II 
with 512 KB of L2 cache requires approximately 960 microseconds: We are confident 
that the code can be substantially improved thus decreasing this estimate. 

We now give the details of our solution. The reader must keep in mind that the or- 
der of presentation of our ideas follows the traditional one (algorithms + experiments) 
but the methodology adopted for our study followed the opposite one (experiments + 
algorithms). Moreover, due to lack of space, we will not give the details of the imple- 
mentation of our procedures. However, the C code of the implementation and a technical 
report are available via anonymous ftp (see Sect. 3). 

2 The Expansion/Compression Approach 

In this section, we describe our approach to solve the IP address lookup problem in terms 
of m-bit addresses. It runs in two phases, expansion and compression. In the expansion 
phase, we implicitly derive the output interfaces for all the 2™ possible addresses. In 
the compression phase, we fix a value 1 < k < m and find two statistical parameters 
ak and j3k related to some combinatorial properties of the items in the routing table at 
hand. We then show that these two parameters characterize the space occupancy of our 
solution. 

2.1 Preliminaries and Notations 

Given the binary alphabet U = {0, 1}, we denote the set of all binary strings of length k 
by Uk, and the set of all binary strings of length at most m by U<m = UjlLgAfe. Given 
two strings a,f3 <E U<m of length ka = |o:| and kp = \j3\, respectively, we say that cx is 
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a prefix of [3 (of length kfi} if the first ka < kp bits of [3 are equal to a (e.g., 101110 is 
a prefix of 10111011011 10). Moreover, we denote hy a - j3 the concatenation of a and 
f3, that is, the string whose first k^ bits are equal to a and whose last kf^ bits are equal to 
f3 (e.g., the concatenation of 101110 and 1101110 is 1011101101110). Finally, given 
a string a and a subset S of U<m, we define a-S={x\x = a- f3 with /? e S'}. 

A routing table T relative to m-bit addresses is a sequenee of pairs (p, h) where the 
route p is a string in S<m and the next-hop interface h is an integer in [1 ... if], with H 
denoting the number of next-hop interfaces * . In the following we will denote by |T| the 
size of T, that is, the number of pairs in the sequenee. Moreover, we will assume that T 
always contains the pair (e, hfi where e denotes the empty string and he corresponds to 
the default next-hop interface. 

The IP address lookup problem can be stated as follows. Given a routing table T 
and X G Um, eompute the next-hop interface hx corresponding to address x. That 
interfaee is uniquely identified by pair {px, hx) <E T for which (a) Px is a prefix of x 
and (b) \px\ > \p\ for any other pair (p, h) € T, sueh that p is a prefix of x. In other 
words, Px is the longest prefix of x appearing in T. The IP address lookup problem is 
well defined sinee T contains the default pair (e, he) and so hx always exists. 



2.2 The Expansion Phase 

We describe formally the intuitive process of extending the routes of T that are shorter 
than m in all possible ways by preserving the information regarding their eorresponding 
next-hop interfaces. We say that T is in decreasing (respeetively, increasing) order if the 
routes in its pairs are lexieographically sorted in that order. We take T in deereasing order 
and number its pairs aeeording to their ranks, so that the resulting pairs are numbered 
Ti, T 2 , . . . , T| 7 ^|, where preeedes Tj if and only if i < j. As a result, if pj is a prefix 
of Pi then Ti preeedes Tj . We use this property to suitably expand the pairs. 

With each pair Ti = (p^, hi) we associate its expansion set, denoted EXP{Ti), to 
eollect all m-bit strings that have Pi as a prefix. Formally, EXP {Ti) = {pi ■ |p . | ) x 

{hi}, for 1 < i < |T|. We then define the expansion of T on m bits, denoted T', as the 
union T' = where sets T/ are induetively defined as follows: T{ = EXP{T\), 

and T/ = EXP{Ti) © Ui<j<iTj, where the operator 0 removes from EXP{Ti) all pairs 
whose routes already appear in the pairs of Ui<j<iTj. In this way, we fill the entries of 
the expanded table T' consistently with the pairs in the routing table T, as stated by the 
following result. 

Fact 1. If {px, hx) € T is the result of the IP address lookup for any m-bit string x, 
then {x, hx) G T'. 

It goes without saying that T' is made up of 2™ pairs and that, if we had enough 
spaee, we could solve the IP address lookup problem with a single access to T'. We 
therefore need to compress T' somehow. 

' We are actually using the term “routing table” to denote what is more properly called “forwar- 
ding table.” Indeed, a routing table contains some additional information. 
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2.3 The Compression Phase 

This phase heavily relies on a parameter k to be fixed later on, where 1 < A: < m. We 
are given the expanded table T' and wish to build three tables row_index, col_index 
and interface to represent the same information as T' in less space by a simple nm 
length encoding (RLE) scheme [9], 

We begin by clustering the pairs in T' according to the first k bits of their strings. 
The cluster corresponding to a string x € is = {(y, hxy) \ y € 2Jm-k and (a: • 
y> hxy) € T'}. Note that its size is \Tl^x)\ ~ 2™^^. We can define our first statistical 
parameter to denote the number of distinct clusters. 

Definition 2. Given a routing table T with m-bit addresses, the row k-size ofT for 
I <k <mis 

ak = |^(®) \ X e Uk'^ ■ 

Parameter ak is the first measure of a simple form of compression. Although we expand 
the prefixes shorter than k, we do not increase the number of distinct clusters as the 
following fact shows. 

Fact 3. For a routing table T, let Vk be the number of distinct next-hop interfaces in all 
the pairs with routes of length at most k, and let Uk be the number of pairs whose routes 
are longer than k, where 1 < k < m. We have ak < Vk + Uk < \T\. 

We now describe the compression based upon the RLE scheme. It takes a cluster 
and returns an RLE sequence S(x) in two logical steps: 

1. Sort in ascending order with respect to the strings y and number its pairs 
according to their ranks, obtaining = {(yi, /ii)}i<i< 2 rn-fc- 

2. Transform into S(^x) by replacing each maximal run (y^, hi), (yi+i, /li+i), . . ., 
{yj,hi+i), such that hi = hi+i = • • • = hi-^i, by a pair {hi, I + 1), where i + 1 is 
called the run length of hi . 

The previous steps encode the 2™“^ pairs of strings and interfaces of each cluster 
into a single and (usually) shorter sequence S(^:). Note that, by Definition 2, cxk 
is the number of distinct RLE sequences Sf^x) so produced. We further process them to 
obtain an equal number of equivalent RLE sequences . The main goal of this step is 

to obtain sequences such that, for any i, the i-th pair of any two such sequences have the 
same run length value. We show how to do it by means of an auxiliary function p{s, t) 
defined on two nonempty RLE sequences s = {a, f) ■ si and t = (6, g) ■ t\, whose total 
sum of run lengths is equal, as follows: 

[ t if Si = = e, 

{b,f)-p{si,ti) iff = gandsi,tife, 

{b,f) ■ p{si, {b,g- f) • ti) iff < y and si,ti f e, 

(6,y) • p{{a,f- g) ■ si,ti) iff > y and si,ti f e. 



p{s,t) = 
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The purpose of ip is to “unify” the run lengths of two RLE sequences by splitting 
some pairs (6, /) into (6, /i), . . (6, fr), such that / = /i + • • • + /^. The unification 
defined by is a variant of standard merge, except that it is not commutative as it only 
returns the (split) pairs in the second RLE sequence. 

In order to apply unification to a set of RLE sequences Si, S 2 , . . . , Sq, we define 
function . . . ,Sq) that returns RLE sequences s'l, S21 • • • , Sg as follows. First, 

Sq = S3) ... , Sq-l),Sq). 

As a result of this step, we obtain that the run lengths in s^ are those common to all the 
input sequences. Then, = if {Sq,Si) for i < q: in this way, the pairs of the set of RLE 
sequences si, S2, . . . , s, are equally split. 

We are now ready to define the second statistical parameter. Given an RLE sequence 
s, let its length |s| be the number of pairs in it. Regarding the routing table T, let us 
take the ak RLE sequences si, S 2 , . . . , Sa^ obtained by the distinct clusters with 
X e Sk, and apply the unification #(si, S2, • • • , Sa^) defined above. 

Definition 4. Given a routing table T with m-bit addresses, the column k-size (3k 
of T, for 1 < k < m, is the (equal) length of the RLE sequences resulting from 
tf>(si,S2,...,SaJ. That is, 

Pk = kafe I- 

Although we increase the length of the original RLE sequences, we have that (3k is 
still linear in \T\. 

Fact 5. For I < k < m, (3k < 3|T|. 

Proof. Let us number the distinct clusters from 1 to cifc, and let rij be the number of 
routes in cluster j, where X)j=i — 1^1- The initial RLE sequence for cluster j has 
length 2rij + 1. Indeed, the first route gives a length of 1, and each subsequent route 
increases that length by 2. After <L> has been applied, each 4>{t, s) produced an RLE 
sequence of length at most |s| + |f|. As a result, (3k is the length of the last sequence, 
and so (3k is bounded by + 1) < 2|T| + cxk< 3|T|. 

2.4 Putting All Together 

We can, finally, prove our result on storing a routing table T in a sufficiently compact 
way to guarantee always a constant number of accesses. We let ((bytes (n) denote the 
number of bytes necessary to store a positive integer n, and a word be sufficiently large 
to hold maxjlog \T\ + 2, log H} bits. 

Theorem 6. Given a routing table T with m-bit addresses and H next-hop interfaces, 
we can store T into three tables row_index, col_index and interface of total size 



2 ^ . ((bytes(ak) + 2™- ^ ■ ((bytes{(3k) + ak ■ (3k ■ ((bytes{H) 
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in bytes or 0{2^ + 2™ ^ + |Tp) words, for 1 < k < m, so that an IP address 
lookup for an m-string x takes exactly three aeeesses given by 

hx = interface [row_index[a;[l . . . A:]] , col_index[a;[A: + 1 . . . m]]] 

where x[i ... j] denotes the substring of x starting from the i-th bit and ending at the 
j-th bit. 

Proof. We first find the distinct clusters (without their explicit construction) and 
produce the corresponding RLE sequences, which we unify by applying <P. The resulting 
RLE sequences, of length fk, are numbered from 1 to cife. At this point, we store the fk 
next-hop values of the j-th sequence in row j of table interface, which hence has 
rows and fk columns. We then set row_index[a;[l ... A:]] = j for each string x such that 
cluster T(^x[i...k]) has been encoded by the j-th RLE sequence. Finally, let /i, . . . , 
be the run lengths in any such sequence. We set col_index[a;[A: + 1 . . . m]] = £ for each 
string X, such that x[k + 1 . . . m] has rank q in Um-k and J2t=i ft < Q '^t=i ft- 
The space and memory access bounds immediately follow. □ 

We wish to point out that the Theorem 6 represents a reasonable trade-off between 
performing a binary search on T with 0(log |T| ) accesses and executing a single access 
on a table of 2™ entries obtained by T. 

3 Data Analysis and Experimental Results 

In this section, we show that the space bound stated by Theorem 6 is practically feasible 
for real routing tables for Internet. In particular, we have analyzed the data of five 
prefix databases which are made available by the IPMA project [7]: these data are daily 
snapshots of the routing tables used at some major network access points. The largest 
database, called MaeEast, is useful fo model a large backbone router while the smallest 
one, called Paix, can be considered a model for an enterprise router. The following 
table shows the sizes of these routing tables on 7/7/98 and on 1/1 1/99 (the third column 
indicates the minimum/maximum size registered between the previous two dates while 
the fourth column indicates the number H of next-hop interfaces). 



Router 


7/7/98 


1/11/99 


Min/Max 


H 


MaeEast 


41231 


43524 


37134/44024 


62 


MaeWest 


18995 


23411 


17906/23489 


62 


Aads 


23755 


24050 


18354/24952 


34 


PacBell 


22416 


22849 


21074/23273 


2 


Paix 


3106 


5935 


1519/5935 


21 



Besides suggesting the approach described in the previous section, the data of these 
databases have also been used to choose the appropriate value of k (that is, the way of 
splitting an IP address). The following table shows the percentages of routes of length 
16 and 24 on 7/7/98 (the third column denotes the most frequent among the remaining 
prefix lengths). 
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Router 


Length 16 


Length 24 


Next 


MaeEast 


13% 


56% 


8% (23 bits) 


MaeWest 


14% 


53% 


7% (23 bits) 


Aads 


25% 


54% 


8% (23 bits) 


PacBell 


13% 


56% 


7% (23 bits) 


Paix 


12% 


53% 


8% (23 bits) 



As it can be seen from the table, these two lengths are the two most frequent in all 
observed routing tables. (In other words, even though it is now allowed to use prefixes of 
any length, it seems that network managers are still using the original approaeh of using 
multiples of 8 aeeording to the class categorization of IP addresses.) If we also eonsider 
that choosing k multiple of 8 is more adequate to the eurrent technology in which bit 
operations are still time consuming, the table’s data suggests to use either A: = 16 or 
k = 24. We choose A: = 16 to balance the size of the tables row_index and col_index 
(see Theorem 6). It now remains to estimate the value of the two statistical parameters 
at and jSk- The next table shows these values measured on 7/7/98 and on 1/1 1/99 (the 
third column shows the maximum values registered between the previous two dates). 



Router 


7/7/98 iak/pk) 


1/11/99 iak/pk) 


Maximum (ak/pk) 


MaeEast 


2577/277 


2745/299 


2821/299 


MaeWest 


2017/263 


2335/268 


2335/285 


Aads 


1903/259 


2100/269 


2140/273 


PacBell 


1399/256 


1500/256 


1500/260 


Paix 


722/256 


984/261 


989/261 



From the previous table’s data and from the registered values of H, it is then possible to 
argue that the i^bytes{ai^) = #hytes{j3k) = 2 and i^bytes{H) = 1, so that, aeeording 
to the bound of Theorem 6, the memory occupaney of our algorithm is equal to Mi = 
2^® -2+2^^ ■2+ak' /3k bytes. Actually, the above memory size eanbe slightly inereased in 
order to make faster the access to the table interface: to this aim, the elements of table 
row_index (respectively, column_index) can be seen as memory pointers (respeetively, 
offsets) instead of as indexes. In this way, the actual value of i/^bytes{ak) becomes 4 
and the corresponding memory occupaney inereases to M 2 = 2^® • 4 + 2^® • 2 + 
bytes. According to these two formulae, we have then the real memory occupancy of 
our algorithm. These values are shown in the following table. 



Router 


7/7/98 (M 1 /M 2 ) 


1/11/99 (Ml/ M 2 ) 


Maximum (M 1 /M 2 ) 


MaeEast 


975973/1107045 


1082899/1213971 


1082899/1213971 


MaeWest 


792615/923687 


887924/1018996 


887924/1018996 


Aads 


755021/886093 


827044/958116 


837804/968876 


PacBell 


620288/751360 


646144/777216 


646424/777496 


Paix 


446976/578048 


518968/650040 


518968/650040 



The data relative to MaeEast seem to show that we need a little bit more than 1 Megabyte 
of cache memory. However, this is not true in practice: indeed, only a small fraetion (ap- 
proximately, 1/5) of table row_index is filled with used values from actual IP addresses. 
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Since only these values are loaded in the cache memory, this implies that the real cache 
memory occupancy is much smaller than the one shown in the previous table. 

We have implemented our algorithm in ANSI C and we have compiled it by using 
the GCC compiler under the Linux operating system and by using the Microsoft Visual 
C compiler under the Windows 95 operating system. The C source and the eomplete 
data-tables of our analysis are available at the URL 

http : //www. dsi .unif i . it/'~piluc/IPLookup/, where we show that the generated 
code is highly tuned and very few can be done to improve the performance of our lookup 
process. 

4 Conclusions 



In this paper, we have proposed a new method for solving the IP address lookup problem 
which originated from the analysis of the real data. This analysis may lead to different 
solutions. For example, we have observed that a well-known binary search approach 
(based on secondary memory aceess techniques) requires on the average a little bit 
more than one memory access for address lookup. However, this approach does not 
always perform better than the expansion/compression teehnique presented in this paper: 
the reason for this is that the implementation of the binary search requires too many 
arithmetic and branch operations which become the actual bottleneck of the algorithm. 
Nevertheless, there exist computer architectures for which this new solution performs 
better: actually, it is our intention to realize a serious comparison of the two approaches 
(comparison that fits into the recently emerging area of algorithm engineering). 

Theorem 6 can be generalized in order to obtain, for any integer r > 1, a space 
bound 0(r2™/’' + |T|’") and r + 1 memory accesses. This could be the right way to 
extend our approach to the coming IPv6 protocol family [1]. It would be interesting to 
find a method for unifying the RLE sequences, so that the final space is provably less 
than the pessimistic 0(|T|’") one. 

Finally, we believe that our approaeh arises several interesting theoretical questions. 
For example, is it possible to derive combinatorial properties of the routing tables in 
order to give better bounds on the values of cjfe and What is the lower bound on 
space occupancy in order to achieve a (small) constant number of memory accesses? 
What about compression schemes other than RLE? 
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Abstract. In a hierarchical server environment, jobs must be assigned in an on- 
line fashion to a collection of servers forming a hierarchy of capability. Each job 
requests a specific server meeting its needs but the system is free to assign it either 
to that server or to any other server higher in the hierarchy. Each job carries a 
certain load, which it imparts to the server to which it is assigned. The goal is to 
minimize the maximum total load on a server. 

We consider the linear hierarchy, where the servers are totally ordered in terms 
of capability, and the tree hierarchy, where ancestors are more powerful than their 
descendants. We investigate several variants of the problem, differing from one 
another by whether jobs are weighted or unweighted', whether they are permanent 
or temporary', and whether assignments are fractional or integral. We derive upper 
and lower bounds on the competitive ratio. 



1 Introduction 

One of the most basic on-line load-balancing problems is the following. Jobs arrive one 
at a time and each must be scheduled on one of n servers. Each job has a certain load 
associated with it and a subset of the servers on which it may be scheduled. These servers 
are said to be eligible for the job. The goal is to assign jobs to servers so as to minimize 
the cost of the assignment, which is defined as the maximum load on a server. 

The nature of the load balancing problem considered here is on-line: decisions must 
be made without any knowledge of future jobs, and previous decisions may not be 
revoked. We compare the performance of an on-line algorithm with the performance of 
an optimal off-line scheduler — one that knows the entire sequence of jobs in advance. 
The efficacy parameter of an on-line scheduler is its competitive ratio, roughly defined as 
the maximum ratio, taken over all possible sequences of jobs, between the cost incurred 
by the algorithm and the cost of an optimal assignment. 

1.1 The Hierarchical Servers Problem 

In the general setting, studied in [6,4,5], the sets of eligible servers are completely 
arbitrary. In practical situations, however, this is almost never the case; very often we 
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find a mixed system in which certain servers are more powerful than others. In the 
hierarchical servers problem the servers form a hierarchy of capability such that any job 
which may run on a given server may also run on any server higher in the hierarchy. We 
consider the linear hierarchy, in which the servers are numbered 1 through n and we 
imagine them to be physically ordered along a straight line running from left to right, 
with server 1 leftmost and server n rightmost. Leftward servers are more capable than 
rightward ones. We say that servers 1, . . . , s are to the left of s (note that s is to the left 
of itself), and that servers s + 1, . . . , n are to the right of s. 

The input is a sequence of jobs, each carrying a positive weight and requesting one 
of the servers. A job requesting server s can be assigned to any of the servers to the left of 
s. Thus, the job’s eligible servers are 1, . . . , s. The assignment of a job with weight w to 
server s increases the load on s by tt; (initially, all loads are 0). We use the terms ‘job’ and 
‘request’ interchangeably. The C05t of a given assignment is COST = max^ where 
Is is the load on server s. We use OP T to denote the cost of an optimal offline assignment. 
An algorithm is c-competitive if there exists a constant b such that COST < c ■ OPT + b 
for all input sequences. 

We consider variants, or models, of the problem according to three orthogonal dicho- 
tomies. In the integral model each job must be assigned in its entirety to a single server, 
whereas in the fractional model a job’s weight may be split among several eligible ser- 
vers. In the weighted model jobs may have arbitrary positive weights, whereas in the 
unweighted model all j obs have weight equal to unity. Our results for the fractional model 
hold for both the unweighted and weighted cases, so we do not distinguish between the 
unweighted fractional model and the weighted fractional model. Finally, permanent jobs 
continue to load the servers to which they are assigned indefinitely, whereas temporary 
jobs are only active for a finite duration, at the end of which they depart. The duration for 
which a temporary job is active is not known upon its arrival. When temporary jobs are 
allowed, the cost of an assignment is defined as COST = maxt max^ {ls{t)}, where 
Is (t) is the load on server s at time t. The version of the problem which we view as basic 
is the weighted integral model with permanent jobs only. 

A natural (and more realistic) generalized setting is one in which the servers form a 
(rooted) tree hierarchy: a job requesting a certain server may be assigned to any of its 
ancestors in the tree. The various models pertain to this generalization as well. 

In practical systems there are often several groups of identical servers (e.g. a computer 
network consisting offifty identical PC’s and three identical file servers). To model such 
systems we can extend our formulation of the problem to allow each level in the hierarchy 
(be it linear or tree) to be populated by several equivalent servers. In this model, requests 
refer to equivalence classes of servers rather than to individual ones. For lack of space 
we omit the details, but it can be seen quite easily that both models are equivalent. 

The hierarchical servers problem is an important paradigm in the sense that it captu- 
res many interesting applications from diverse areas. Among these are assigning elasses 
of service quality to calls in communication networks, routing queries to hierarehical 
databases, signing documents by ranking executives, and upgrading classes of ears by 
ear rental companies. 
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The hierarchical servers problem is an instance of the general assignment problem 
considered in [6], obtained by restricting the class of allowable eligible sets. (The class 
we consider is all sets of the form {1, . . . , s} with 1 < s < n.) Since an J?(log n) lower 
bound is known for the general setting [6], it is natural to ask for which (interesting) 
classes of eligible sets can we get competitive factors better than J?(log n). The hierar- 
chical servers problem admits constant competitive factors and seems to be unique in 
this respect; we know of no other non-trivial “natural” classes for which sub-logarithmic 
competitiveness is attainable. In fact, in the full paper we analyze three such classes, 
showing a logarithmic lower bound for each. 

A further motivation for studying the problem of hierarchical servers is its relation 
to the problem of related machines introduced in [2]. In this problem all servers are 
eligible for all jobs, but the servers may have different speeds: assigning ajob of weight 
m to a server with speed v increases its load hy w/v. Without loss of generality, assume 

> V 2 > ■ ■ ■ > Vn, where V{ is the speed of server %. Consider a set of jobs to be 
assigned at a cost bounded by C and let us focus on a particular job whose weight is 
w. To achieve COST < C we must refrain from assigning this job to any server i 
for which w/vi > C. In other words, there exists a rightmost server to which we may 
assign the job. Thus, restricting the cost yields eligibility constraints similar to those in 
the hierarchical servers problem. 

1.2 Background 

Graham [10] explored the problem of assignment to identical machines, where each 
job may be assigned to any of the servers. He showed that the greedy algorithm has a 
competitive ratio of 2 — - . Later work [7,8,11,1] investigated the exact competitive ratio 
achievable for this problem for general n and for various special cases. The best results 
to date, for general n, are a lower bound of 1.852 and an upper bound of 1.923 [1]. 

Over the years many other load balancing problems were studied; see [3,12] for 
surveys. The assignment problem in which arbitrary sets of eligible servers are allowed 
was considered in [6]. They showed upper and lower bounds of 0(logn) for several 
variants of this problem. Permanent jobs were assumed. Subsequent papers generalized 
the problem to allow temporary jobs: in [4] a lower bound of J?(v/n) and an upper bound 
of 0(v?/^) were shown. The upper bound was later tightened to [5]. 

The related machines problem was investigated in [2] . They showed an 8-competitive 
algorithm based on the doubling technique. This result was improved in [9], where 
a more refined doubling algorithm was shown to be 3 + 2\/5 = 5.828-competitive. 
By randomizing this algorithm, they were able to improve the bound to 4.311. They 
also showed lower bounds of 2.438 (deterministic) and 1.837 (randomized). In [5] the 
problem was generalized to allow temporary jobs; they showed an upper bound of 20, 
achieved by a doubling algorithm, and a lower bound of 3. Both the upper and lower 
bounds are deterministic. 

1.3 Our Results 

A significant portion of our work is devoted to developing a continuous framework in 
which we reeast the problem. The continuous framework is a fully Hedged model, in 
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which a new variant of the problem is defined. The novelty of our approach lies in the 
realization that weight and load are qualitatively distinct notions. Although the term load 
is defined as a sum of weights, it is in fact more accurately interpreted as the density 
of weight ocurring at a server in the weight distribution defined by the assignment. 
This distinetion between load and weight, made explieit in the continuous model, is 
unobserved in the problem definition due to the fact that the “volume” of each server 
is equal to unity, and thus the numerical value of weight density (i.e. weight/ volume) 
coincides with that of total weight. 

In Sect. 2 we define a ^emi-continuous model and construct an e-competitive algo- 
rithm. We then show how to transform any algorithm for the semi-continuous model into 
an algorithm for the fractional model, and how to transform any algorithm for the frac- 
tional model into an algorithm for the integral models (both weighted and unweighted). 
We thus obtain an e-competitive algorithm for the fraetional model and an algorithm for 
the integral models which is e and (e -F 1) -competitive in the unweighted and weighted 
cases, respeetively. 

In Sect. 3 we develop a procedure for deriving lower bounds in the context of the 
continuous model. The lower bounds obtained with our proeedure are also valid in the 
discrete models (fractional as well as integral), even in the unweighted ease with per- 
manent jobs only, and even with respect to randomized algorithms. Using our procedure 
we find that e is a tight lower bound. 

In Sect. 4 we eonsider temporary jobs in the integral model. We show an algorithm 
which is 4 and 5-competitive in the unweighted and weighted cases, respeetively. In the 
full paper we also show a deterministic lower bound of 3. 

In the full paper we extend the problem to the tree hierarchy. We show an algorithm 
which is respectively 4, 4 and 5-competitive for the fraetional, unweighted integral, and 
weighted integral models. Randomizing this algorithm improves its competitiveness to e, 
e and e -F 1 respectively. We show deterministic and randomized lower bounds of J? ( ^/n) 
for all models when temporary jobs are allowed. Our lower bound construetions, which 
also apply in the general setting of arbitrary eligible sets, are considerably simpler than 
those presented in [4]. They are, however, less tight by small constant factors. 

In the full paper we also investigate the effect of restricting the sets of eligible servers 
in ways other than the linear and tree hierarchies. Namely, we consider the following 
three models: (1) the servers are on a line and eligible servers must be contiguous, (2) 
the servers form a tree such that if a server is eligible then so are its descendants, and 
(3) The eligible sets have some fixed cardinality. We show a logarithmic lower bound in 
each case. 



2 Upper Bounds 



We show an algorithm whose respective versions for the fraetional, unweighted integral 
and weighted integral models are e, e and (e -F 1) -competitive. The fractional version 
admits temporary jobs; the integral versions do not. We build up to the algorithm by 
introducing and studying the semi-continuous model and the elass of memoryless algo- 
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rithms. We begin with the Optimum Lemma, whieh characterizes OPT in terms of the 
input sequence. 

Lemma 1 (Optimum Lemma). For a given input sequence, denote by Ws the total 
weight of jobs requesting servers to the left of s and let H = max,, [Ws /s}. Let Wmax 
be the maximum weight of a job in the input sequence. Then, 

1. In the fractional model, OPT = H. 

2. In the unweighted integral model, OPT = \H~\- 

3. In the weighted integral model, max [H, rtimax} < OPT < H + ty max- 

2.1 Memoryless Algorithms 

Generally speaking, a memoryless algorithm is an algorithm which assigns each job 
independently of previous jobs. Clearly, memoryless algorithms are only of interest in 
the fractional model, which shall therefore be the model on which we focus in this section. 
Note that the competitiveness of memoryless algorithms is immune to the presence of 
temporary jobs. If a memoryless algorithm is c-competitive with respect to permanent 
jobs, then it must remain c-competitive in the presence of temporary jobs, since at all 
times the momentary cost of its assignment cannot exceed c times the optimal cost of 
the active jobs. 

We focus on a restricted type of memory less algorithms, which we name uniform al- 
gorithms. Uniform memoryless algorithms are instances of the generic algorithm shown 
below. Each instance is characterized by a function u : IN — ^ (0, 1] satisfying u(l) = 1. 



Algorithm GenericUniform 

When a job of weight w, requesting server s, arrives: 

1. Let r w and i s. 

2. While r > 0: 

3. Assign a = min [w-u{i), r} units of weight to server i. 

4. r r — a. 

5. i ^ i — 1. 



The algorithm starts with the server requested by the job and proceeds leftward as 
long as the job is not fully assigned. The fraction of the job’s weight assigned to server i 
is tt(i), unless w-u{i) is more than what is left of the job when i is reached. The condition 
u(l) = 1 ensures that the job’s weight will always be assigned in full by the algorithm. 

Note that the assignment generated by a uniform memoryless algorithm is indepen- 
dent of both the number of servers and the order of the jobs in the input. We therefore 
assume that exactly one job requests each server (we allowjobs of zero weight) and that 
the number of servers is infinite. We allow infinite request sequences for which the cost 
is finite. Such sequences represent the limit behavior of the algorithm over a sequence 
of finite input sequences. We denote the weight of the job requesting some server s by 

Ws- 

Consider a job of weight w requesting a server to the right of some server s. If the 
requested server is close to s the job will leave w-u(s) units of weight on s regardless 
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of the exact server requested. At some point, however, if the request is made far enough 
from s the weight assigned to s will begin to diminish as the distance of the request 
from s grows. Finally, if the request is made “very” far away, it will have no effect on 
s. We denote by Ps the point beyond which the effect on s begins to diminish and by p'^ 



the point at which it dies out completely, namely, Ps = max 




and 



max 



{s' T.Us+iu{i) < l}. 



Note that Ps and p'^ may be undefined, in which case we take them to be infinity. 
We are only interested in functions u for which Ps is finite. The importance of Ps stems 
from the fact that the load on s due to jobs requesting servers in the range s, . . . ,Ps is 
simply u{s) times the total weight of these jobs. 



Lemma 2 (Worst Case Lemma). Let Abe a uniform memoryless algorithm. The pro- 
blem: 



Given K > 0 and some server s, find an input sequence that maximizes the 
load on s in A’s assignment, subject to OPT = K 



is solved by the following sequence of jobs: 



r 0 




PsK 

K 

0 



I <i <Ps 
i = Ps 
Ps<i<p's 

i > p', (if p'^ < oo) 



( 1 ) 



and Is — the resultant load on s — satisfies PsKu{s) <ls< p'gKujs). 



Corollary 3. Let Abe a uniform memoryless algorithm defined by u whose competitive 
ratio is Cj[. Then, sup^ {psu(s)} < Ca < sup^ {pgU(s)}. 



2.2 The Semi-continuous Model 

In both the fractional and integral versions of the problem the servers and the jobs are 
discrete objects. We therefore refer to these models as the discrete models. In this section 
we introduce the semi-continuous model, in which the servers are made continuous. In 
section 3 we make the jobs continuous as well, resulting in the continuous model. 

The semi-continuous model is best understood through a physical metaphor. Consi- 
der the bottom of a vessel filled with some non-uniform fluid applying varying degrees 
of pressure at different points. The force acting at any single point is zero, but any region 
of non-zero area suffers a net force equal to the integral of the pressure over the region. 
Similarly, in the semi-continuous model we do not speak of individual servers; rather, 
we have a continuum of servers, analogous to the bottom of the vessel. An arriving job is 
analogous to a quantity of fluid which must be added to the vessel. The notions of load 
and weight become divorced; load is analogous to pressure and weight is analogous to 
force. 

Formally, the server interval is (0,oo), to which jobs must be assigned. Job j has 
weight Wj and it requests a point Sj in the server interval. The assignment of job j is 
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specified by an integrable function Qj : (0, oo) — ^ [0, oo) satisfying 9j{x) dx = Wj 
and X > Sj ^ 9j{x) = 0. The assignment of a sequence of jobs is 9 = 9j. The 

load 1 1 on a non-empty interval I = {xq, x\ ) is defined as li = 9{x) dx — the 

mean weight density over I. The load at a point x is defined as sup^g/ {h}- The cost of 
an assignment is COST = sup/ {li}. 

Lemma 4 (Optimum Lemma — Semi-continuous Model). Lei LL(a:) be the total weight 
of requests made to the left of x (including x itself). Then OPT = sup^ {W{x)/x}. 

Let us adapt the definition of uniform memoryless algorithms to the semi-continuous 
model. In this model a uniform algorithm is characterized by a function u : (0, 00 ) — ^ 
(0, 00 ) as follows. For a given point x, let q{x) be the point such that u{z) dz = 1. 
Then the assignment of job j is 9j{x) = Wju{x) for q{sj) < x < Sj and 9j{x) = 0 
elsewhere. For this to work we must require that u{x) dx = 00 for all t > 0. We 

define p{x) as the point such that u{z) dz = 1. If such a point does not exist then 
the algorithm’s competitive ratio cannot be bounded, as demonstrated by the infinite 
request sequence in which all jobs have weight equal to unity and the j’th job requests 
the point Sj = j. Clearly, OPT = 1, whereas the algorithm’s assignment places an 
infinite load at x. We shall therefore allow only algorithms such that u{x) dx = 00 
for all M > 0. 

Lemma 5 (Worst Case Lemma — Semi-continuous Model). Let A be uniform algo- 
rithm defined by u{x). The problem: 

Given K > 0 and some point s,find an input sequence that maximizes the load 
on s in A’s assignment, subject to OPT = K 

is solved by a single job of weight p{s)K requesting the point p{s). The resultant load 
at s is p{s)Ku{s). 

Corollary 6. The competitive ratio of A is sup^ {p{x)u{x)}. 



2.3 An e-competitive Algorithm 

Consider Algorithm Flarmonic — the uniform memoryless algorithm defined by u{x) = 
1/x. Let us calculate p(x)\ 



1 = 




dz , pix) 
— = in 

2 : X 



p{x) = ex . 

Thus, the competitive ratio of Algorithm Flarmonic is sup^ ~ 

show that e is a lower bound, hence Algorithm Harmonic is optimal. 



( 2 ) 

( 3 ) 

e. In Sect. 3 we 



2.4 Application to the Discrete Models 

We show how to transform any algorithm for the semi-continuous model into an algo- 
rithm for the (discrete) fractional model, and how to transform any algorithm for the 
fractional model into an algorithm for the integral models. 
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Semi-continuous to Fractional. Let A be a c-eompetitive online algorithm for the 
semi-eontinuous model. Define algorithm B for the fractional model as follows. When 
job j arrives, B assigns 9j{x) dx units of weight to server i, for all i, where gj is 
the assignment function generated by A for the job. Clearly, the cost incurred by B is 
bounded by the cost incurred by A, and thus B is c-competitive. 

An important observation is that if A is memoryless, then so is B. Thus, even if 
temporary jobs are allowed, the assignment generated by B will be c-competitive at all 
times, compared to an optimal (off-line) assignment of the active jobs. 



Fractional to Integral. Let A be an algorithm for the fractional model. Define algo- 
rithm B for the integral model (both weighted and unweighted) as follows. As jobs arrive, 
B keeps track of the assignments A would make. A server is said to be overloaded if its 
load in ^^’s assignment exceeds its load in A’s assignment. When a job arrives, B assigns 
it to the rightmost eligible server which is not overloaded (after A is allowed to assign 
the job). 

Proposition 7. Algorithm B is well defined, i.e. whenever a job arrives, at least one of 
its eligible servers is not overloaded. Moreover, if A is c-competitive then B is c and 
(c + \)-competitive in the unweighted and weighted models, respectively. 

3 Lower Bounds 

In this section we outline a technique for proving lower bounds. The bounds obtained 
are valid in both the fractional and integral models — even in the unweighted case. In 
fact, they remain valid even at the presence of randomization with respect to oblivious 
adversaries. Using this technique, we obtain a tight constant lower bound of e. The 
success of our approach is facilitated by transporting the problem from the discrete 
setting into the continuous model, in which both jobs and servers are continuous. 

3.1 A Simple Lower Bound 

We consider the fractional model, restricting our attention to right-to-left input sequen- 
ces: sequences in which for all i < j, all requests for server j are made before any 
request for server i. We further restrict our attention to sequences in which each server 
is requested exactly once (although we now allow jobs of zero weight). 

Let Abe a /c-competitive algorithm, and consider some input sequence. Denote by Xg 
the weight of the job requesting server s, and by Ig the load on server s at a given moment. 
Recall the definition of H in the Optimum Lemma. Suppose the first n — i + 1 jobs 
(culminating with the job requesting server i) have been assigned by A. Define hi as the 
value of H for this prefix of the input sequence. For j > i, define hij = j Y^i=i We 
have hi = maxi<j<„ {hij}. Since A is /c-competitive, the loads must obey Ig < khi, 
for all s. 

Now consider the specific input sequence defined by ri = • • • = = tt;, for some 

w > 0. For this sequence we have /li = (n — i-f-l)ra/nforalH.Thus,afterthefirstjobis 
assigned we have /n < /cu>/n. After the second job is assigned we have /„_i < 2kw/n, 
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but In < kw/n still holds, because the new job could not be assigned to server n. In 
general, after the request for server s is processed, we have k < {n — i + l)kw/n for 
alH ^ s. ^Jotin^ ths-t tliG totsl AVGi^ht of jobs in the input ecjuuls the totul loud on servers 
in the cumulative assignment, we get, 



n 

nw = Vi 
2 = 1 



E'- 



< ^(n-j + 1) 

2 = 1 



kw 

n 



n 






kw 

n 



kwn{n + 1) 
2n 
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Hence, k > 2- . Thus, 2 is a constant lower bound. 



3.2 Discussion 

Figure 1 depicts the request sequence and the resultant khi ’s in histogram-like fashion, 
with heights of bars indicating the respective values. The bars are of equal width, so we 
can equivalently consider their area rather than height. To be precise, let us redraw the 
histograms with bars of width 1 and height equal to the numerical values they represent. 
Then, the total weight to be assigned is the total area of the job bars, and the total weight 
actually assigned is bounded from above by the total area of the khi bars. Now, instead 
of drawing a histogram of khi, let us draw a histogram of hi. Then, the lower bound is 
found by solving 



total area of job bars < k ■ total area of hi bars . 

total area of job bars 

k > . 

total area of hi bars 



(a) 




(b) 



36 

27 

18 
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Fig. 1. (a) Histogram of job weights; w = 12. (b) Histogram of khi\ k=3. 



These considerations are independent of the specific input sequence at hand, so we 
have actually developed a general procedure for obtaining lower bounds. Select an input 
sequence and plot its histogram and the histogram of the resultant hi ’s. Divide the area 
of the former by the area of the latter to obtain a lower bound. 

Scaling both histograms by the same factor does not affect the ratio of areas, so 
we can go one step further and cast the procedure in purely geometric terms. Take as 
input a histogram where the width of each bar is 4 and the height of the tallest bar is 
1. Let hij be the area of bars i through j divided hy jjn (the width of j bars), and let 
hi = maxi<j<n {ki,j}. Divide the area of the input histogram by the area of the hi 
histogram to obtain the corresponding lower bound. 
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3.3 The Continuous Model 

The continuous model is motivated by the geometric interpretation outlined in the pre- 
vious section. It differs from the semi-continuous model introduced in Section 2 in two 
ways. First, in contrast with the semi-continuous model, we now use a finite server 
interval [0, S']. The second, more important, difference is that in the continuous model 
requests are not discrete; rather, they arrive over time in a continuous flow. 

Formally, we have a server interval [0, S], and a time interval [0, T] during which the 
request flow arrives. Instead of a request sequence we have a request function f{x, t). 
Fort e [0,T], f{x,t) is an integrable non-negative real function of a: defined over [0, Sj. 
The interpretation of f{x, t) is by means of integration, i.e. f{x, t) dx represents the 
total amount of weight requesting points in the interval [xq, a;i] up to time t. We express 
the fact that requests accumulate over time by requiring that t' >t^ f{x,t') > f{x,t) 
for all X. The assignment function g(x, t) is defined similarly: for t € [0, T], g{x, t) is an 
integrable non-negative real function of x defined over [0, S], and g{x, t) dx is the 
total weight assigned to interval [xq , a;i ] up to time f. Assigned weight accumulates too, so 
gmustobeyf' > t ^ g{x,t') > f). Weexpressthefactthatweightmaybeassigned 

to the left of the point it requested but not to its right, by demanding that for all x' and t, 

g(x, t) dx < f{x, t) dx, with equality for x' = Q (which expresses our desire not 
to assign more weight than was requested). To make assignments irrevocable, we require 
that for all x' and f < t' , g{x, t') dx— g{x, t) dx < ff, f{x, t') dx— f{x, t) dx. 

An on-line algorithm in the continuous model is an algorithm which, given f{x, t), 
outputs g{x, t) such that for all r e [0, T], g{x, t) in the region [0, S] X [0, r] is inde- 
pendent of f{x, t) outside that region. The cumulative assignment is g{x) = g(x, T). 
An offline assignment is a function g{x) that satisfies the conditions for representing an 
assignment at time T (those conditions relating assignments at different times notwith- 
standing). The definitions of load and cost are identical to those in the semi-continuous 
model. 

Lemma 8 (Optimum Lemma — Continuous Model). 

OPT= sup \-l' f{x,T)dx\ . (5) 

a:'e(0,S] Jo J 

While the continuous model is an interesting construction in its own right, we focus 
here on the aspects relevant to the discrete problem. Assume S = T = 1. A right-to-left 
request function is one that satisfies f{x,t) = Oforf < 1— x, and f{x,t) = f{x, 1 — x) 
for t > 1 — X. Thus, f{x, t) is completely defined by specifying f{x,l — x) for all 
X € [0, 1]. We abbreviate and use /(x). Any online assignment generated in response to 
a right-to-left request function clearly satisfies g{x, t) = g{x, 1 — x) for f > 1 — x. We 
extend the definition of right-to-left request functions to cases other than S' = T = 1 in 
the obvious way. 

Consider a right-to-left request function f{x) and the corresponding assignment 
g{x) generated by some /c-competitive on-line algorithm. We wish to bound the value 
of g{x) at some point a. Since the interpretation of g is only by means of its integral, we 
may assume without loss of generality that g is continuous from the left. Define a new 
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request function fa by: fa{x) = f{x) iora < x < S, and fa(x) = 0 elsewhere. Define, 
for b > a, haip) = I f{x) dx, and h{a) = sup(j <{,<5 {ha{b)}. Then h{a) = OPT 
with respect to fa - (Note the analogy with hij and hi in the discrete model.) We denote 
W = fg f{x) dx and W = Jq h{a) da. The value of g in [a, 1] must be the same for 
/ and fa, as g is produced by an on-line algorithm, thus g{a) < kh{a). Hence, 

rS 

W= f{x) dx= g{x) dx<k h{a) da = kW' , (6) 

Jo Jo Jo 

from which the lower bound W/W is readily obtained. 

Claim. A lower bound of e can be obtained with our method by considering the request 
function ^ in the limit /c — ^ oo and S — ^ oo. 



Theorem 9. The lower bounds obtained by our method in the continuous model are 
valid in all discrete models as well, even in the presence of randomization with respect 
to oblivious adversaries. 



4 Temporary Jobs 

In this section we allow temporary jobs in the input. In Sect. 2 we saw an e-competitive 
algorithm for the fractional model; here we present an algorithm for the integral model 
which is 4-competitive in the unweighted case and 5-competitive in the weighted case. 
In the full paper we also show a lower bound of 3 for the unweighted integral model. 

Recall the definition of H in the Optimum Lemma. Consider the jobs which are 
active upon job fs arrival (including job j). Let H{j) be the value of H defined with 
respect to these jobs. A server is saturated on the arrival of job j if its load is at least 
where f is a constant to be determined below. 



Algorithm PushRight 

Assign each job it to its rightmost unsaturated eligible server. 



Proposition 10. Ift > 4, then whenever a job arrives, at least one of its eligible servers 
is unsaturated. Thus, by taking f = 4 we get an algorithm which is 4 and 5-competitive 
for the unweighted and weighted models, respectively. 
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Abstract. This paper deals with the on-line allocation of shared data objects to 
the local memory modules of the nodes in a network. We assume that the data 
is organized in indivisible objects such as files, pages, or global variables. The 
data objects can be replieated and discarded over time in order to minimize the 
communication load for read and write accesses done by the nodes in the network. 
Non-uniform data management is characterized by a different communieation 
load for accesses to small pieces of the data objects and migrations of whole data 
objects. 

We introduce on-line algorithms that minimize the congestion, i.e., the maximum 
communication load over all links. Our algorithms are evaluated in a competi- 
tive analysis comparing the congestion produced by an on-line algorithm with the 
congestion produced by an optimal off-line algorithm. We present the first determi- 
nistic and distributed algorithm that achieves a constant competitive ratio on trees. 
Our algorithm minimizes not only the congestion but minimizes simultaneously 
the load on each individual edge up to a optimal factor of 3. 

Algorithms for trees are of special interest as they can be used as a subroutine 
in algorithms for other networks. For example, using our tree algorithm as a 
subroutine in the recently introduced “access tree strategy” yields an algorithm 
that is 0(d • logn)-competitive for d-dimensional meshes with n nodes. This 
competitive ratio is known to be optimal for meshes of constant dimension. 



1 Introduction 

Large parallel and distributed systems - such as massively parallel processor systems 
(MPPs), networks of workstations (NOWs), or the Internet - consist of a set of nodes 
each having its own local memory module. In this paper, we consider the problem of 
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managing shared data that is read and written from the nodes in the network. Usually, 
the data is organized in blocks, which we call data objects. The objects are, e.g., files 
on a distributed file server, pages in a virtual shared memory system, or global variables 
in a parallel program. The communication overhead for accessing the shared data can 
be reduced by storing copies of the data objects in the local storage of some processors. 
However, the data of an object should be kept together in order to reduce the bookkeeping 
overhead. 

Clearly, creating many copies reduces the communication overhead for read accesses. 
However, it increases the overhead for maintaining the copies consistent when a write 
access is issued. A data management strategy has to answer the following questions. 

- How many copies of an object should be made? 

- On which nodes should these copies be placed? 

- How should read and write requests be served? 

Typically, words or other small blocks of data that can be accessed or updated over the 
communication links are much smaller than the data objects. For example, large files 
usually consist of several small records, and pages of virtual memory consist of cache 
lines that can be accessed and updated individually. Thus, moving a data object can be 
much more expensive than accessing only a small piece of data from the object. 

The file allocation problem (FAP) is an abstract formulation of this non-uniform data 
management problem, which was introduced by Bartal et al. in [1]. The algorithms for 
FAP are evaluated in a competitive analysis that compares the communication cost of an 
on-line algorithm with the cost of an optimal off-line algorithm. For a given application 
A, let Cqp^{A) denote the minimum cost expended by an optimal off-line strategy. A 
deterministic strategy is said to be c-competitive if it expends cost of at most c- Copt (^), 
for any application A. 

If the on-line algorithm uses randomization one has to describe the power given to the 
adversary more precisely. This is studied intensively in [2]. We always assume that the 
adversary is oblivious, i.e., the adversary is assumed to specify the whole sequence (e.g., 
in advance) without knowing the random bits of the on-line algorithm. A randomized 
strategy is said to be c-competitive if it expends expected cost of at most c • Copt(^), for 
any application A. 

1.1 Formal Definition of FAP 

We are given a weighted undirected graph G = (V,E), where each node represents a 
processor with local memory module, the edges represent communication links, and 
the edge weights represent the bandwidths of these links. For e & E, let 6(e) denote 
the bandwidth of e. Let X denote the set of data objects. At any time, for every object 
X £ X, let R{x) C V, the residence set, represent the set of nodes that hold a copy of 
X. We always require R{x) ^ 0 . Initially, only a single node contains a copy of x, this 
node is known by all nodes in the system. 

As time goes on, read and write requests occur at the processors. The requests are 
assumed to be generated by an adversary. The adversary initiates a sequence of requests 
(7 = (Ti(T 2 • • •, where oi corresponds to a read or write request issued by one of the nodes. 
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Initially, we assume that the on-line algorithm has to serve these requests one after the 
other, that is, it is assumed that is issued not before the reallocation for ai is finished. 

At the end of this paper, in Section 5, we will show that all of the algorithms that we 
have developed in the sequential FAP model are able to handle parallel and overlapping 
requests, too. 

A read request to an object x requires access to a copy of x, a write request requires 
updating all copies of x. After a request is served, the on-line algorithm can decide 
how to reallocate the multiple copies of x. Any communication that proceeds along an 
edge increases the communication load of that edge by some amount, which depends 
on whether a read, write, or migration operation is performed. The increase is defined 
as follows. 

- Read operation: A read request for x issued by a node v can be served by any 
processor u holding a copy of x. A path has to be allocated through the network 
from V to u. The communication load on each edge e on this path increases by 
l/6(e). 

- Write operation: A write request for x issued by a node v requires to update all 
copies of X. A multicast tree connecting v with all nodes holding a copy of x, i.e., 
a Steiner tree, has to be allocated. The communication load on each edge e in the 
Steiner tree increases by l/b{e). 

- Object Migration: The algorithm can replicate a copy of x from one node to ano- 
ther along an arbitrary path. The communication load on each edge e on this path 
increases by D{x)/b{e), where D{x) > 1 is assumed to be an arbitrary integer re- 
presenting the ratio between the load induced by the migration of x and the load 
induced by accessing only a unit of data of x. 

Efficient algorithms for distributed data management have to work in a distributed 
fashion. In particular, the processors do not have knowledge about the global state of 
the system, that is, each processor notices only the read and write accesses and the copy 
migrations that pass the node. In order to accumulate additional knowledge a processor 
has to communicate with other processors, which also increases the load on the involved 
edges. 

- Exchange of Information: Information about the global state of the system, e.g. the 
actual residence set, can be exchanged by sending messages along a path from one 
node to another. It is assumed that the messages have small size, e.g., they include 
an ident-number of a data object and a tag for an action that should be performed on 
the receiving node. The communieation load on each edge e on this path increases 
by l/6(e). 

The original formulation of FAP does not consider communication overhead for the 
exchange of information between proeessors. Apart from that, it considers only a very 
simple cost measure, the total communication load, i.e., the sum of the load over all 
edges in the network. Additionally, we investigate the congestion, i.e., the maximum 
communication load over all links. We believe that the congestion is of special interest 
for practical algorithms as it prevents that some of the links become bottlenecks. 
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1.2 Previous and Related Work 

Data management algorithms for trees are of special interest since many networks have a 
tree-like topology, e.g., Ethernet-connected NOWs. Besides algorithms for trees can be 
used as a subroutine for data management strategies for other networks. Therefore, much 
research deals with data management on trees. For example, Bartal et al. [1] describe 
distributed strategies for FAP on trees that aim to minimize the total communication load. 
They introduce a randomized 3-competitive algorithm and a deterministic 9-competitive 
algorithm. Both competitive ratios are with respect to the total communication load. 

Lund et al. [6] describe a 3-competitive deterministic but centralized strategy for the 
same problem. The algorithm makes use of global knowledge about a work function 
which is influenced by any request issued in the network. Ignoring cost for information 
exchange, the algorithm minimizes the load on any edge up to a factor of 3, and, hence, 
it achieves competitive ratio 3 with respect to the total communication load and the 
congestion. Unfortunately, this algorithm is inherently centralized. 

Data management strategies for trees and meshes that aim to minimize the congestion 
in a uniform cost model (i.e., D{x) = 1) are given by Maggs et al. [7]. The uniform 
costs simplify the problem significantly. They present a 3-competitive strategy for trees, 
and an 0{d ■ log n) -competitive strategy for d-dimensional meshes with n nodes. Both 
competitive ratios are with respect to the congestion in the uniform model. Further, they 
present strategies for Intemet-like clustered networks, and a lower bound of J? (log n/d) 
on the best possible competitive ratio for data management on meshes. 

A lower bound that holds for any network including at least one edge is shown by 
Black and Sleator [3]. Properly said, they give a lower bound of 3 on the competitive 
ratio of data migration algorithms on two processors connected by a single edge, which 
induces that the best possible competitive ratio for the total load and the congestion 
in any network is 3. Bartal et al. [1] show that the bound holds also for randomized 
algorithms. 

A difficult problem that has to be solved by any distributed data management strategy 
is the data tracking, i.e., the problem of how to locate the copies of a particular object. 
To our knowledge, data tracking mechanisms that aim to minimize the congestion have 
not been investigated previously. Some results (see, e.g., [1]) are known for the total 
communication load. 

1.3 Our Results 

At first, we describe a strategy for FAP on two nodes connected by a single edge. This 
strategy is 3-competitive, and it can be computed in a distributed fashion by the two 
connected nodes. The key feature of the edge strategy is that it can be extended to work 
on trees just by simulating it on any edge in the tree. The result of this approach is a sim- 
ple, deterministic, and distributed strategy for FAP on trees which minimizes the load on 
any edge up to a factor of 3 . This result is optimal because of the lower bound in [3] . Ob- 
viously, the bound on the load of any edge induces that our tree strategy is 3-competitive 
with respect to the total communication load and the congestion simultaneously. 

Further, we present a distributed strategy for FAP on meshes. We use a variant of 
the locality preserving embedding of so-called “access trees” introduced in [7], and 
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simulate our tree strategies on the access trees. We show that this simulation approaeh 
yields a data management strategy having competitive ratio 0{d ■ logn) with respeet to 
the congestion, where d denotes the dimension and n the number of nodes of the mesh. 
The strategy is randomized, and the bound on the eongestion does not only hold for the 
expectation but holds with high probability (w.h.p.), i.e., with probability 1 — n~“, where 
a is an arbitrary constant. The lower bound presented in [7] shows that this competitive 
ratio is optimal for meshes of constant dimension. 

The 3-eompetitive tree strategy can not only be used as a subroutine for data ma- 
nagement on meshes but also on other networks, e.g., Intemet-like elustered network 
as defined in [7]. Plugging in our tree strategy for TAP as a subroutine in the data ma- 
nagement strategy on clustered networks instead of the tree strategy used in [7] yields 
close-to-optimal congestion for TAP on clustered networks. In fact, all bounds shown 
there for data management with uniform migration cost can be extended immediately to 
TAP with non-uniform cost. 

At the end of this paper, we show that all presented bounds hold also for “data- 
race free programs” including overlapping and parallel requests, whieh illustrates the 
practieal usage on real world computer networks. The major restrietion of data-race free 
programs is that parallel write accesses to the same data objeet have to be protected 
by synehronization mechanisms like barriers or loeks. In fact, all of our strategies can 
be used even in a fully asynehronous setting allowing arbitrary overlappings of read 
and write accesses, only the theoretical model breaks down if applieations with data- 
race eonditions are used. Praetical experiments for uniform variants (i.e., D(x) = 1) 
of the presented strategies for meshes have been implemented in the DIVA (Distributed 
Variables) library [5]. Results of an experimental evaluation showing that these strategies 
are very eompetitive also in practice can be found in [4,9]. 



2 File Allocation on a Single Edge 

We describe a deterministic and distributed file allocation strategy for a single edge e 
conneeting two nodes a and b. The strategy uses a simple eounting mechanism which 
records read and write aceesses that are issued on the two nodes. Later on we will use 
this strategy for building our tree strategy. Then the nodes a and b correspond to the two 
conneeted components in whieh the tree is divided if e is removed. 

Initially, we do not care about how the processors a and b exchange information 
about the residence set, and how the counters are distributed among them. We assume 
that node a always knows whether or not node b holds a eopy and vice versa. Afterwards 
we show how our strategy can be adapted to the distributed setting. 

2.1 The Centralized Edge Strategy 

Each object x is handled independently from the other objects. Let us fix an object x. 
Define D = D{x). Concerning this object there are two counters Ca and Ci,. Informally, 
these eounters represent saving accounts for cost referring to x. 

Initially, one copy of x is placed on one of the two nodes. Assume that it is placed 
on a. Then Ca is set to D, and Cb is set to 0. In Fig. 1 the edge strategy is deseribed for 




94 



F. Meyer auf der Heide, B. Vocking, and M. Westermann 



- Node a issues a read request for object x: 

If Ca < -D then Ca := Ca + 1; 

If Ca = -D and a holds no copy of x then 

• Move a new copy of x onto a; 

• If c b = 0 then delete the copy of x on b; 

- Node a issues a write request for object x: 

If Cfc > 0 then 

• Cb :=Cb~l; 

else 

- If Ca < -D then Ca := Ca + 1; 

If Ca = -D and a holds no copy of x then move a new copy of x onto a; 
If Cfc = 0 and a holds a copy of x then delete the copy of x on b; 



Fig. 1. The edge strategy for requests issued by node a. 



requests issued by node a. The strategy works analogously for requests issued by node 
b. Note that the edge strategy always keeps one copy of x, since it only deletes a copy 
on a node if the other node also holds a copy. 

Lemma 1. The centralized edge strategy minimizes the load up to a factor of 3. 

Proof. We use a potential function argument (cf. [8]). First, let us fix an optimal off-line 
strategy, which is denoted the optimal strategy in the following. We assume that the 
optimal strategy, in contrast to the on-line strategy, reallocates its residence set before 
serving a request. W.l.o.g., the optimal strategy fulfills the following properties. 

- If a node v issues a read request to x, then the optimal strategy does not delete a 
copy, that is, the only possible change of the residence set is that a new copy of x is 
moved to v. 

- If a node v issues a write request to x, then the only possible changes of the residence 
set are that a new copy of x is moved to v and/or a copy of x is deleted on the neighbor 
of V. 

Fix a sequence of read and write requests a = aia 2 - ■ ■■ Let Te(jge(t) and Lopt(^) 
denote the load of the edge strategy and the optimal strategy, respectively, after serving 
at, and let <P{t) denote the value of a potential function after serving dt, which is defined 
in detail later. In order to prove the lemma, we have to show that 

(^) -^edge(^) + ^ 3 • iopt(i) and 

(b) <P{t) > 0. 

Let Ca{t) and Cb{t) denote the value of the counter Ca and the value of the counter 
Cb, respectively, after serving at, and let and f?opt(^) denote the residence set 

of the edge strategy and the optimal strategy, respectively, after serving at. We define 



‘T{t) = ‘Ta{t) + ‘Tb{t)-D 
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Table 1. Possible changes of configuration if node a issues a read request. 



where, for v G {a, b}, 



^v{t) = 



' 2c^{t), if V ^ -Redge(^) ^ ^ Ropti*)’ 

3D - Cy{t), if V ^ -Redge(^) ^ ^ -Ropt(*)> 

3D — 2cy{t), if V G f?g(jge(t) and v G f?opt(^)’ 

^ Cy{t), if n G f?edge(*) ^ ^ -Ropt(^)- 



First, we prove (b). The optimal strategy always holds a copy of a; on a node. Hence, 
V G f?opt(^)’ for some node v G {a, 6}, and, consequently, > D, which implies 
#a(i) + ^b{t) > D. Hence, (b) is shown. 

Now we prove (a) by induction on the length of a. Obviously, (a) holds for the initial 
setting. For the induction step suppose that Lgjjgg(f) + ^>(f) < 3- LQp^{t). Let A = 

-^edge (^ + f ) ~ -^edge (^) ’ ^-^opt = -^opt (^ + 1 ) “ -^opt (^ ) > — tPa{t + ^) 

and A(I>b = 'Pb{t+1) ~'Pb{t). In order to prove the induction step, we show that 



^■^edge ^ a T A^'b A 3 • ALqp^. (1) 

We distinguish between read and write requests. 

- (Tt+i is a read request issued by node a. In this case equation (I) can be checked 
with Table 1 containing all possible changes of configuration. Note that, if a issues 
a read request, the only possible changes of the residence sets are that one of the 
strategies moves a copy to a, or c;, = 0 and the edge strategy deletes the copy on b. 
In both cases, A<^b = 0, and, hence, Atl> = A^a- 

- (Tt+i is a write request from node a. We distinguish between the cases Cb{t) > 0 
and Cb{t) = 0. Note that, if a issues a write request, the only possible changes of the 
residence set of the optimal strategy is that a new copy of x is moved to a and/or a 
copy of X is deleted on b. 

• Suppose Cb{t) > 0. In this case equation (I) can be checked with Table 2 contai- 
ning all possible changes of configuration. Note that, if a issues a write request 
and Cb{t) > 0, the only possible transitions of the edge strategy are from {a} to 
{a}, from {6} to {6}, or from {a, b} to {a, b} or {a}. 





96 



F. Meyer auf der Heide, B. Vocking, and M. Westermann 





^opt(f) 




^opt(^+ 1) 


^^edge 


A^a < 


VI 

<1 


^^opt 


a 


a 


a 


a 


0 


0 


-2 


0 


a 


a,b 


a 


a,b 


0 


0 


1 


1 


a 


a,b 


a 


a 


0 


0 


-2 


0 


a 


b 


a 


b 


0 


0 


1 


1 


a 


b 


a 


a,b 


0 


3D 


1 


l + D 


a 


b 


a 


a 


0 


3D 


-2 


D 


b 


a 


b 


a 


1 


0 


-1 


0 


b 


a,b 


b 


a,b 


1 


0 


2 


1 


b 


a,b 


b 


a 


1 


0 


-1 


0 


b 


b 


b 


b 


1 


0 


2 


1 


b 


b 


b 


a,b 


1 


3D 


2 


l + D 


b 


b 


b 


a 


1 


3D 


-1 


D 


a,b 


a 


a,b 


a 


1 


0 


-1 


0 


a,b 


a,b 


a,b 


a,b 


1 


0 


2 


1 


a,b 


a,b 


a,b 


a 


1 


0 


-1 


0 


a,b 


b 


a,b 


b 


1 


0 


2 


1 


a,b 


b 


a,b 


a,b 


1 


3D 


2 


l + D 


a,b 


b 


a,b 


a 


1 


3D 


-1 


D 


a,b 


a 


a 


a 


1 


0 


-1 


0 


a,b 


a,b 


a 


a,b 


1 


0 


2 


1 


a,b 


a,b 


a 


a 


1 


0 


2 -3D 


0 


a,b 


b 


a 


b 


1 


0 


2 


1 


a,b 


b 


a 


a,b 


1 


3D 


2 


l + D 


a,b 


b 


a 


a 


1 


3D 


2 -3D 


D 



Table 2. Possible changes of configuration if node a issues a write request and Cb{t) > 0. 



• Suppose Cb{t)= 0. In this case equation (1) can be checked with Table 3 contai- 
ning all possible changes of configuration. Note that, if a issues a write request 
and Cb{t) = 0, the only possible transitions of the edge strategy are from {a} to 
{a} or from {6} to {6} or {a}. 



2.2 The Distributed Edge Strategy 

Next, we describe how the edge strategy can be adapted from the centralized to the 
distributed setting. Each node keeps always the current value of both counters Ca and 
Cb- It is obvious that this assignment makes it possible for the nodes to make the right 
decisions according to the edge strategy. Now we specify how each node keeps track of 
both counters Ca and Cb- W.l.o.g., consider node a. 

- A read request for x is issued by b. If b holds no copy of x then a is able to update 
its counters. In the other case, b sends an information message along e if and only 
if b has increased its counter Cb- 
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Table 3. Possible changes of configuration if node a issues a write request and (t) =0. 



- A write request for x is issued by b. If a holds a eopy of x then a is able to update 
its counters. In the other case, b sends an information message along e if and only 
if b has decreased its counter Ca or increased its counter Ch- 
in this way, each node is able to keep both of its counters up-to-date. The following 

lemma shows that the additional information messages are sent very rarely. 

Lemma 2. The distributed edge strategy minimizes the load up to a faetor of 3. 

Proof. We adopt the notations and definitions of Lemma 1 . An information message is 
only sent if a request changes the value of a counter. Thus, the tables used for proving 
Lemma 1 change only slightly. 

- (Tt-ri is ^ read request issued by node a. Then an additional message is sent only if 
a e i?g(jgg (T) . In this case, ZlLg^jgg equals 1 rather than 0. Besides, AT> equals —2 
rather than 0 if a G Roptf) and Ca{t) < D. It is easy to check that equation (1) is 
still satisfied by applying these changes to Table 1 . 

- Ct+i is a write request issued by node a. Then an additional message is sent only 

if 6 ^ ^edge(^)- case, = 1 rather than 0. Further, equals —2 

rather than 0 if a G Roptf) and Ca{t) < D. It is easy to check that equation (1) is 
still satisfied by applying these changes to the Tables 2 and 3. 
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3 File Allocation on Trees 

The distributed edge strategy can be extended to tree-connected networks. The network 
is modeled by a graph T = (V,E) without cycles, i.e., a tree. The edges in the tree are 
allowed to have arbitrary bandwidths. 

The tree strategy is composed out of |i?| individual edge strategies. The idea is to 
simulate the distributed edge strategy on each edge. Consider an arbitrary edge e = ( a, 6) . 
The removal of an edge e divides T into two subtrees and containing a and b, 
respectively. The two nodes a and b execute the algorithm described in Fig. 1. The 
phrases “if a holds a (no) copy of x" and “node a issues a read (write) request for object 
x" are just replaced by “if a node in holds a (no) copy of x” and “a node in issues 

a read (write) request for object x’\ respectively. 

The simulation works properly as long as the nodes in the residence set R{x) build 
a connected component in the tree. A key feature of our edge strategy is that it fulfills 
this condition. This is shown in the following lemma. 

Lemma 3. The graph induced by the residence set R{x) is always a connected compo- 
nent. 

Proof. (Sketch) Via induction on the length of the sequence of requests it can be shown 
that those counters on any simple path in the tree that are responsible for moving a copy 
along an edge towards the first node of the path are non-decreasing from the first to the 
last node on the path. Hence, all these counters “agree” about the distribution of copies. 
This ensures that all copies stay in a connected component. 

Note that the tree algorithm also does not need any additional information exchange 
apart from the one done by the distributed edge strategy. Therefore, the following theorem 
follows immediately from Lemma 2. 

Theorem 4. The tree strategy minimizes the load on any edge up to a factor of 3. 



4 File Allocation on Meshes 

In this section, we consider strategies for the mesh M = . . ^mf), i.e., the d- 

dimensional mesh-connected network with side length > 2 in dimension i. The 
number of processors is denoted by n, i.e., n = mi • • • nid. Each edge is assumed to have 
bandwidth 1. 

The strategy uses a locality preserving embedding of “access trees” introduced in 
[7]. It is based on a hierarchical decomposition of M, which we describe recursively. 
Let i be the smallest index such that rtii = max{mi , . . . , }. If m^ = 1 then we have 

reached the end of the recursion. Otherwise, we partition M into two non-overlapping 
submeshes M\ = M(mi,. . . , [mi/2],. . .^mf) and M 2 = M(mi,. . . , [mi/2j,. . 

Ml and M 2 are then decomposed recursively according to the same rules. 

The hierarchical decomposition has associated with it a decomposition tree T(M) , in 
which each node corresponds to one of the submeshes, i.e., the root of T(M) corresponds 
to M itself, and the children of a node v in the tree correspond to the two submeshes into 
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which the submesh corresponding to v is divided. Thus, T{M) is a binary tree of height 
0(log n) in whieh the leaves correspond to submeshes of size one, i.e., to the proeessors 
of M. For eaeh node v in T{M), let M{v) denote the corresponding submesh. 

For eaeh object xeX, define an access tree (M) to be a eopy of the decomposition 
tree T{M). We embed the access trees randomly into M, i.e., for each x <E X, each 
interior node v of Tx{M) is mapped uniformly at random to one of the proeessors in 
M (v), and eaeh leaf v of Tx{M) is mapped onto the only proeessor in M (u). 

The remaining description of our data management strategy is very simple: For 
object X £ X, we simulate the tree strategy on the access tree Tx{M). All messages 
that should be sent between neighboring nodes in the access trees are sent along the 
dimension-by-dimension order paths between the associated nodes in the mesh, i.e., the 
unique shortest path between the two nodes using first edges of dimension 1, then edges 
of dimension 2, and so on. 

The aeeess tree nodes have to be remapped dynamically when too many access 
messages, i.e., messages that simulate messages of the tree strategy, traverse a node. 
The remapping is done as follows. For every objeet x, and every node v of the aeeess 
tree Tx{M) we add a eounter t{x, u). Initially, this counter is set to 0, and the eounter 
t{x,v) is inereased by 1 whenever an access message for object x traverses node v, 
starts at node v, or arrives at node v. When the counter t{x, v) reaches K the node v is 
remapped randomly to another node in M (v), where K is some integer of suitable size, 
i.e., K = 0(D(x)). Remapping u to a new host means that we have to send a migration 
message that informs the new host about the migration and, if the old host holds a eopy 
of X, moves the copy to the new host. Migration messages reset the counter t{x, v) to 
0. Furthermore, we have to send notification messages including information about the 
new host to the mesh nodes that hold the access tree neighbors of v. These notification 
messages also increase the eounters at their destination nodes. The counter mechanism 
ensures that the load due to messages that are direeted to an access tree node embedded 
on a randomly selected host is 0{K) = 0{D{x)). 

The following theorem gives the competitive ratio of the access tree strategy for 
d-dimensional meshes with n nodes. It ean be obtained from the analysis in [7]. 

Theorems. The access tree strategy is 0{d ■ \ogn)-competitive with respect to the 
congestion, w.h.p.,for meshes of dimension d with n nodes. 

5 Extending the Results to Data-Race Free Applications 

An important class that allows concurrent read aeeesses is the class of data— race free 
applications, which is defined as follows. We assume that an adversary specifies a parallel 
applieation running on the nodes of the network, i.e., the adversary initiates read and 
write requests on the nodes of the network. A write access to an objeet is not allowed 
to overlap with other aeeesses to the same object, and there is some order among the 
accesses to the same objeet such that, for each read and write aeeess, there is a unique 
least reeent write. Note that this still allows arbitrary concurrent aeeesses to different 
objects and concurrent read aeeesses to the same object. An execution using a dynamic 
data management strategy is called consistent if it ensures that a read request directed 
to an object always returns the value of the most recent write access to the same objeet. 




100 



F. Meyer auf der Heide, B. Vocking, and M. Westermann 



A data management strategy is allowed to migrate, create, and invalidate copies of an 
object during execution time. We use the same cost metric as defined in the Introduction, 
that is, any message except for migration messages increases the load of an edge e by 
l/6(e). Migration messages of an object x increase the load by D{x)/b{e). 

We have to describe how parallel accesses are handled by the presented strategy 
such that the execution is consistent. On trees this works as follows. Since we consider 
only data-race free applications, write accesses do not overlap with other accesses. 
Overlapping read accesses are handled in the following way. Consider a request message 
M arriving on a node u that does not hold a copy of x. Let e denote the next edge on the 
path to the nearest copy. Suppose another request message M' directed to x has been sent 
already along e but a data message has not yet been sent back. Then the request message 
M is blocked on node u until the data message corresponding to M' passes e. When 
this message arrives, either a new copy is created on node u, and u serves the request 
message M, or M continues its path to the connected component of copies. As meshes 
and clustered networks simulate the tree strategy on access trees that are embedded in 
the network, they can follow the same approach. 

We can conclude that all competitive ratios given in this paper hold also for data- 
race free applications, which indicates that the introduced strategies are well suited for 
practical usage. 
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Abstract. A maj or task of telecommunication network planners is deciding where 
spare capacity is needed, and how much, so that interrupted traffic may be rerouted 
in the event of a failure. Planning the spare capacity so as to minimize cost is an 
NP-hard problem, and for large networks, even the linear relaxation is too large 
to be solved with existing methods. The main contribution of this paper is a fast 
algorithm for restoration capacity planning with a proven performance ratio of at 
most 2 + e, and which generates solutions that are at most 1% away from optimal 
in empirical studies on a range of networks, with up to a few hundred nodes. 

As a preliminary step, we present the first (1 + e) -approximation algorithm for 
restoration capacity planning. The algorithm could be practical for moderate-size 
networks. It requires the solution of a multicommodity-flow type linear program 
with 0{m\G\) commodities, however, where G is the set of distinct traffic rou- 
tes, and therefore 0{m^\G\) variables. For many networks of practical interest, 
this results in programs too large to be handled with current linear programming 
technology. Our second result, therefore, has greater practical relevance: a (2 + e)- 
approximation algorithm that requires only the solution of a linear program with 
0{m) commodities, and hence 0{m?) variables. The linear program has been 
of manageable size for all practical telecommunications network instances that 
have arisen in the authors’ applications, and we present an implementation of the 
algorithm and an experimental evaluation showing that it is within 1% of optimal 
on a range of networks arising practice. 

We also consider a more general problem in which both service and restoration 
routes are computed together. Both approximation algorithms extend to this case, 
with approximation ratios of 1 + e and 4 + e, respectively. 



1 Introduction 

Modem telecommunications networks are designed to be highly fault tolerant. Custo- 
mers expect to see uninterrupted service, even in the event of faults such as power 
outages, equipment failures, natural disasters and cable cuts. Typically networks are 
engineered to guarantee complete restoration of disrupted services in the event of any 
single catastrophic failure (such as a fiber cut). For this to be possible, spare capacity 
must be added to the network so that traffic that has been interrupted by a fault can be 
rerouted. 

Restoration capacity is a sizable fraction of total network capacity, and hence ac- 
counts for a large part of the infrastmcture cost of telecommunications networks. It is 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 101-1 15, 1999. 
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therefore essential for network planners to have efficient and effective algorithms for de- 
ciding where restoration capacity is needed, and how much. This is known variously as 
restoration capacity planning, capacitated survivable network design, resilient capacity 
reservation, or spare capacity assignment. Typical applications require that capacity be 
allocated in discrete units, and that traffic flows are indivisible. With these requirements, 
the problem becomes NP hard [4]. 

Modem telecommunications networks may involve hundreds of offices and fiber 
routes. For such large networks, even the linear relaxation of the natural integer program 
formulations cannot be solved, as it is too large for current linear program solvers. Ho- 
wever, advances in linear programming methods and in processor power have recently 
allowed near-optimal solution of the restoration capacity planning problem for some 
reasonably large networks. Cwilich et al. [3] describe a system based on column genera- 
tion that exactly solves a linear relaxation of the restoration capacity planning problem, 
and that has been used in the held. While column generation works very well in practice, 
and is typically faster than solution of equivalent multicommodity-flow based formula- 
tions, it has not been proven to mn in polynomial time. In addition, there are existing 
telecommunications networks that are too large for the column generation method to 
solve effectively. 

The first contribution of this paper is an application of randomized rounding [13] 
to the multicommodity flow formulation to derive the first polynomial-time (1 + e)- 
approximation algorithm for the restoration capacity planning problem. This result is of 
theoretical interest only, since in practice the column generation approach, with suitable 
rounding, is faster and yields comparable results. 

The major contribution of this paper is a new approximation algorithm, LBAlg, that 
is fast and effective on large networks. Cwilich et al. [4] describe a linear program that 
gives a lower bound for the restoration capacity planning problem, and And empirically 
that the lower bound is extremely close to upper bound found by column generation. 
We show here that the lower bound is tight within a factor of two, and we use the 
lower bound as the basis of LBAlg. We prove that LBAlg is a (2 + e) -approximation 
algorithm. It much faster both theoretically and in practice than both the flow-based 
(1 + fc) approximation and the column-generation algorithm, so it is useful for large 
networks that are beyond the reach of the existing methods. We present an empirical study 
in which LBAlg consistently produces solutions that are within 1% of the lower bound, 
and hence within 1% of optimal, on a range of networks that have arisen in practice. 
(Column generation acheives the same quality solutions on the smaller networks in the 
study.) The running time of LBAlg ranged from a few minutes on a network with 58 
nodes, to a few hours on a network with 452 nodes. 

The third contribution of this paper concerns situations when service routing and 
restoration capacity may be optimized together. * In the standard restoration capacity 
assignment problem, the service routes are taken as given, but it may be possible to reduce 
the total network cost if the service and restoration routes are optimized at the same time. 
We give a polynomial-time 1 + 1 approximation algorithm for this more general problem 

' This seems to happen infrequently in practice; service routes are generally optimized accor- 
ding to other measures, such as minimal delay or customer requirements, and then restoration 
capacity is added separately. 
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and show that the 2 + e approximation algorithm for fixed serviee routes ean be eombined 
with shortest-path routing to give a 4 + e-approximation algorithm. 

1.1 Related Work 

Alevras, Grotschel and Wessaly [1] give a good survey of models for restoration capa- 
eity planning (whieh they term capaeitated survivable network design). Herzberg, Bye 
and Utano [8] and Iraschko, MacGregor and Grover [9] give integer programming for- 
mulations for various restoration capacity planning problems, based on enumerating all 
possible restoration paths between demand endpoints. These formulations are turned 
into practical methods by restricting the set of restoration paths by various heuristics. 
Cwilich et al. [3] substantially improve these methods by the use of LP column gene- 
ration techniques, which allow intelligent search of the space of restoration paths. No 
approximation guarantees or estimates of the quality of the solutions are given in any 
of these papers, though Cwilich et al. provably find the optimal solution to the linear 
relaxation of their formulation. Brightwell, Oriolo and Shepherd [2] study the problem 
of designing restoration capacity for a single demand, and give a 1 + approximation 
algorithm for this problem. Cwilich et al. [4] present an empirical comparison of column 
generation with a heuristic approach that is fast enough for use on large networks, but 
without performance guarantees. We know of no prior approximation guarantees for the 
restoration capacity planning problem with multiple demand pairs. 

Alevras, Grotschel and Wessaly [1] mention some computational results on solving 
mixed integer linear programs that model restoration capacity planning problems related 
to ours. Their largest instance has 17 nodes, 64 edges, and 106 demand pairs. They report 
“reasonable” solutions in times ranging from a few seconds to several hours. Iraschko, 
MacGregor and Grover [9] report computational results for their integer programming 
heuristic, but give no estimate of the absolute quality of the solutions. Their running 
times vary from minutes to 2.7 days, and their largest test networks are smaller than our 
smallest test networks. 

The restoration capacity planning problem is rather different from purely graph- 
theoretic network-reliability problems such as finding disjoint paths (see for example 
Kleinberg’s thesis [11]); connectivity augmentation (for example adding a min-cost 
set of edges to a graph to increase its connectivity [5,6]); or the generalized Steiner 
problem (see eg. [10]), in which a subgraph must be found that contains Tuv edge-disjoint 
paths between a collection of required {u, u} pairs. (Grotschel, Monma and Stoer [7] 
survey methods to solve exactly these NP-Hard topological connectivity problems.) The 
restoration capacity planning problem differs in being capacitated and in having a fixed 
underlying network and explicit representation of the set of failures and the affected 
demands. 



2 The Restoration Capacity Planning Problem 

The input to the restoration capacity planning problem is an undirected network N = 
(V,E), where G is a set of nodes and 77 is a set of edges, and a set of routed demands, R. 
We use n = |G| and m = |77|. We assume that N is two-edge connected; otherwise, we 
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work within each 2-edge-connected component. Each edge has an associated cost per 
unit capacity. Each demand d € i? is described by a pair of terminations in V, a size or 
required capacity, and a service route, which is a simple path between the terminations. 
A failure is the deletion of a single edge f <e E. A failure affects a demand if the failing 
edge occurs in the service route of that demand. The output of a restoration capacity 
planning algorithm is a set of restoration routes for each demand. For each failure that 
affects the demand, there must be a restoration route that connects the endpoints and 
bypasses the failure. The same restoration route may be used for different failures, as 
long as it avoids all the failing edges. 

The objective is to minimize the cost of the solution, which is determined by the 
edge capacities. These are determined in turn from the restoration routes. First, for each 
failure /, determine the collection of restoration paths that will be used. For each edge e 
in the network, determine how much capacity, cap{e, /), is needed to carry the protection 
paths that are in use for the failure /. The required restoration capacity of edge e is the 
maximum over failures / of cap(e, /). In this way restoration capacity is shared between 
different failures. 

There are two variants to the cost model: in the total cost model we seek to minimize 
the cost of service and restoration capacity combined, while in the restoration cost model, 
we consider only the cost of restoration capacity. This distinction is useful in describing 
the performance of approximation algorithms, and it represents two extremes in practical 
applications. When planning restoration capacity for an existing network, it is important 
to minimize just the restoration capacity. If a network design is to be generated as part 
of an architectural study, for example to investigate the total impact of a particular 
architectural decision, then the total cost may be a more suitable metric. 

For the applications that inspired this paper, the edges are long-haul optical fiber 
routes, and the nodes are cities or fiber junctions. Due to advances in optical multiplexing 
the number of wavelengths that can be transmitted down a fiber optic cable is effectively 
unlimited, but each transmitted wavelength requires a pair of lasers at either endpoint of 
the cable and a mileage-dependent number of repeaters along the line. This determines 
the cost per unit capacity. 

Our formulation has several aspects that model actual network restoration techniques. 
First, the route that a demand follows cannot change unless it is actually interrupted by 
a failure (this is termed “strong resilience” by Brightwell et al. [2]). This precludes 
arbitrary rearrangements of the traffic after a failure. The latter approach might yield 
cheaper overall cost, but it is both undesirable to interrupt customer service and hard 
to do so reliably and safely. Second, a restoration route for a given demand can use as 
much of its own service capacity as is useful in the restoration route, but it cannot use 
any other service capacity that may have been freed up by moving other demands from 
their service routes, for similar reasons of reliability and safety. Finally, the models and 
algorithms in this paper can be generalized to multiple failures, but in practice such 
events occur with sufficiently low probability that the cost of building enough capacity 
to handle them is not justified. 
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3 A (1 + e)-Approximation Algorithm 

This section describes an algorithm, based on linear programming and randomized ro- 
unding [13], that produces restoration plans whose cost is provably within a (1 + e) 
factor of optimal. 

The linear program essentially conjointly solves a collection of multicommodity 
flows, one per failure. To reduce the total number of commodities, all demands that have 
the same service route are aggregated together into a demand group. Let G denote the 
resulting set of demand groups. Let F denote the set of failures, which in our case is 
isomorphic to the set of edges E. 

The linear program is shown in Figure 1. For each pair (g, /) where g is a demand 
group whose service route is affected by the failure /, there is a commodity. Variables 
flow(g, /, u, v) and flow(g, /, v, u), give the flow between u and v in either direction, 
subject to flow conservation (constraint 4) and the additional constraint that there is no 
flow on a failed edge (constraint 1). The non-negative variable cap(e) represents the 
restoration capacity of edge e, which is the maximum required for any failure (con- 
straint 3). A demand group can use its own service capacity for free. The (arbitrary) 
choice of source and sink for the endpoints of the demand group determines a direction 
of service flow. 



Constants: 

supply(g, v) Supply for demand group g at node v 
serviee(g, u, v) Service flow of demand group g from m to u on e = {u, u} 
cost(e) Unit cost of provisioning capacity on edge e 

Variables, all non-negative: 

flow(gi, /, u, v) Flow of group g on edge e = {u, u} from u to v, under failure / 
restcap(gi, /, u, v) Restoration capacity on e = {u, u} required for group g under failure / 
cap(e) Total restoration capacity of edge e 

Minimize cost(e) • cap(e) subject to: 

f^ov/{g,f,u,v) = f\ov/{g,f,v,u) = 0 V{u,v} = f e F, g e G (1) 

restcap(g, /, u, v) > flow(g, /, u, v) — service(g, u, v) Vg <E G, f F,u,v \ {u, v} E E (2) 
cap(e = {«, i)}) > (restcap(g, /, u, v) + restcap(g, f,v,u)) Ve E E, f <E F (3) 

9 

flow(g,/,x,v) = flow(g,/,i;,x) + supply(g,i;) VfEF,gEG (4) 

x:{x,v}^E x:{v,x}^E 



Fig. 1. The linear program large-LP 



The non-negativity of variables restcap(gr, /, u, v) enforces the requirement that a 
given demand cannot use (for restoration) service capacity that has been freed up by 




106 



S.J. Phillips and J.R. Westbrook 



other demands that have been rerouted. If this requirement were removed, the linear 
program could be simplified by eliminating the restcap variables. 

After solving the linear program, we perform the following randomized rounding 
step. The flow is first partitioned between the demands in each demand group in the 
natural way. Then for each demand we perform a random walk guided by the flow 
values. Specifically, for a demand d between s and t, let A be the set of edges leaving 
s. For e G A let /(e) be the fiow of d on e. We choose an edge e with probability 
/(e) / /(e). The random walk is continued from the other end of the chosen edge, 

until t is reached. 

3.1 Analysis 

The number of variables in the linear program is 0{rn^\G\), since there are m edges, m 
failures and |G| demand groups. Clearly this is polynomial in the size of the input. 

We now analyze the performance of the algorithm. For simplicity we assume that all 
demands have size 1 ; the analysis generalizes to the general case. By induction one can 
show that the probability that the random walk for a demand d crosses an edge e is equal 
to the fiow of d on e. Now consider the indicator variable x(d, e, /) that is 1 iff demand 
d crosses edge e on failure /: for fixed e and / the variables x(d, e, /) are independent 
Bernoulli variables. We use the following Chemoff bound [12]. 

Lemma 1. LetXi , . . . , be a set of independent Bernoulli variables such that P[Xi = 

1] = Pi and P[Xi = 0] = 1 — Pi. Let Y = 'ffXi, so E\Y] = 'ffpi. Then for e G [0, 1], 

P[\Y - E[Y] \ > eE[Y]] < 

Applying this bound gives the following result: 

Theorem 2. Suppose that the service capacity of each edge is at least ^ log 2m. Then 
with probability at least 1/2 the cost of the restoration plan produced by the above 
algorithm is at most 1 + e times that of the optimal restoration plan. 

Proof. For an edge e let cap(e) be the capacity given in the solution of the linear pro- 
gram. Consider the event that after the randomized rounding a particular edge e requires 
capacity more than (1 — t)cap(e) under a particular failure /. Applying Lemma 1, we 
have that the probability of this event is at most l/2m?. There are m? such bad events, 
so with probability at least 1/2 none of them occurs, and the capacity required on each 
edge after randomized rounding is at most 1 — e times that of the linear program solution. 

It is not too unreasonable to have a restriction on the service capacity, since tele- 
communications network use discrete capacities. For example, in a network of OC48s 
designed to carry T3s, the minimum capacity of an edge is 48. If the network has 30 
edges then we get e = 0.66, and if the linear program solution has at least 2 OC48s per 
edge we can use e = 0.47. 

The bound proven in Theorem 2 is a worst-case bound, and it is worth noting that 
the algorithm does not not depend on the choice of e, and would perform better than the 
theoretical bound in practice. Note also that the solution of the linear program provides 
a lower bound on the cost of the optimal restoration plan. This lower bound is at least as 
tight as the linear program of Section 4, at the price of requiring the solution of a linear 
program with a much larger number of variables. 
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4 A Linear Program Lower Bound 

This section describes a linear program that gives a lower bound on the cost of the 
optimal routing of both service and protection. The linear program was first presented 
in [4], The number of variables in the linear program is independent of G, so it 

is substantially smaller than the LP of Section 3. In the following section, we use this 
lower-bound LP to construct a solution whose total cost is within (2 + t) of the optimal 
total cost for a given demand set R. 

The intuition behind the lower bound LP is as follows. Choose a particular demand 
d and its service route p. Suppose we restrict restoration routes so that if p is cut by the 
failure of edge /, then the restoration route p' must consist of the prefix of p up to the tail 
of /, some path from the tail to the head of /, and then remaining suffix of p from the 
head of / onwards. In this restricted setting, a lower bound on the restoration cost can 
be computed by aggregating all traffic through each possible failure /, and conjointly 
computing flows from the tail to the head of each / so as to minimize the maximum 
over flows of the cost. To extend this to the general case, we do not charge for flow that 
is traveling along service paths in the reverse direction. 

The solution to the linear program is only a lower bound on required restoration 
capacity, because the aggregation of demands for each failure allows feasible solutions 
that do not correspond to any set of restoration routes, and because the solution may be 
fractional. 

The lower bound linear program lowerbound-LP is shown in Figure 2. There is a 
commodity for each edge / in the network, which represents the service traffic that must 
be rerouted when edge / fails. This is set equal to the flow of service traffic across the 
edge. The source of the commodity is arbitrarily chosen to be one endpoint of /, and 
the sink the other endpoint. The choice of source and sink implies a direction of flow for 
all service routes crossing the edge (the source of the service route is eonnected to the 
source of commodity /. The eonstant shared(/, e) denotes the service traffic that uses 
both the failing edge / and the edge e. Let service(e) refer to all service traffic on e. 

Standard network flow constraints (constraint 8) generate a flow between source and 
sink. The variable cap(e) represent the restoration capaeity of edge e, and eonstraint 5 
ensures the restoration capaeity is enough to handle any edge failure. The objective of 
the linear program is to minimize the total cost of restoration capaeity. 

Theorem 3. The optimum solution of the lower bound LP is a lower bound on the 
optimum solution of the large LP. 

Proof. [4] Let R f denote the set of service paths crossing edge /. >From the restoration 
routes for Rf we construct a feasible solution to the linear program whose cost is no 
more than the cost of the restoration routes. Consider d £ Rf with serviee path p and 
restoration path pj avoiding /. For the eommodity for edge / in the linear program, we 
route the eontribution from d as follows. >From either the endpoints of / the commodity 
follows p in the reverse directions until hitting a node on pf. Between those two nodes 
the commodity flows forward along pf. Only the latter part of the route contributes to 
the cap( ) variables, because along the rest of the route d contributes equally to both 
fiow(/, •) and shared(/, •). 
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Constants: 

demand(/, v) Supply for edge / commodity at node v 
shared(/, {u, n}) Service flow of commodity / on edge e = {u, u} 
cost(e) Unit cost of provisioning capacity on edge e 

Restoration flow variables, all non-negative: 

flow(/, u,v), flow(/, V, u) Forward and backward flow on edge e = {u, n} of commodity / 
cap(e) Restoration capacity of edge e 

Minimize cost(e) • cap(e) subject to 

cap(e = {u, u}) > flow(/, u, v) + flow(/, v, u) — shared(/, {u, u}) (5) 

VeeEJ eF (6) 

flow(/, M, u) = flow(/, u, m) = 0 Vf = {u,v}eF (7) 

flow(/, u) = flow(/, u, x) -f demand(/, u) VfeF,veV (8) 

x:{x,v}£E x:{v,x}^E 

(9) 



Fig. 2. The linear program Lower-LP 



The linear program has 0{m) commodities, each of which has 0{m) flow variables. 
Hence the LP has 0{rri^) variables and 0{mn) constraints. Note that we could alter- 
natively use a path-generation formulation of the the lower bound. Such a formulation 
may be faster in practice than the LP presented here, but it has exponential behavior in 
the worst case. 



5 From Lower Bound to Approximation Algorithm 

The algorithm LBAlg starts with an optimum (fractional) solution of lowerbound-LP, 
and produces a restoration plan. The bounds below work for demands in the range (0,1], 
but for simplicity of exposition we simply assume unit demands. We first present a simple 
version that allows a simple proof of the approximation ratio, followed by the efflcient 
version, optimized for performance in practice, that is used in the empirical evaluation 
of the next section. 

The simple version consists of the following two steps: 

1. Path extraction. For each edge failure /, the flow variables flow(/, •, •) fromlower- 
bound-LP are used to generate a set of paths between the endpoints of e as follows. 

a) First for each edge e = {u, v) set the capacity of e to be 
cap(e) = [flow(/, u, v) + flow(/, v, u)]. 

b) Since the edge capacities are integral, we can then And an integral flow of size 
demand(/, u) from u to v, satisfying the edge capacities. 

c) From the integer flow it is simple to produce a set of demand (/, u) paths from 
u to V, such that the number of paths on an edge e is at most cap(e) . 
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2. Path-assignment. The restoration paths are arbitrarily matched to the service paths 
that use / 

3. Splicing. Each restoration path is spliced into its matched service path. See Figure 3. 




Fig. 3. Splicing a restoration path between endpoints of failed edge / into service path 



Theorem 4. Assume the minimum service capacity on an edge is at least 1/e. Then 
algorithm LBAlg is a {2 + e)-approximation algorithm under the total-cost measure. 

Proof. During path extraction at most \flow{f, u, v) -I- flow{f, v, u)] restoration paths 
are generated that cross edge e = {u, u}. Applying constraint (5), the restoration capacity 
required by algorithm LBAlg on failure / is at most 

[cap(e)] + shared(/, e) 

< cap(e) + 1 + service(e) 

< cap(e) + service(e)(l -I- e) (by hypothesis) 

and therefore the cost due to restoration capacity of algorithm LBAlg is at most the cost 
of lowerbound-LP plus (1 -1- e) times the cost of the service capacity. 

The efficient version of LBAlg uses only the per-edge restoration capacities cap( ) 
from the lowerbound-LP, rather than the paths, and is structured as follows. 

1 . Capacity Rounding. Obtain a solution to lowerbound-LP with integer edge capa- 
cities: 

a) Sort the edges in increasing order of the fractional fraction part of cap ( •) 
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b) For each edge e in this order, do the following: 

i. Round down cap(e) 

ii. For each edge failure /, test whether the demand for / can still be routed 
within the edge capacities. 

iii. If the test fails for some /, increment cap(e) 

2. Path Generation. 

a) Start with the edge capacities from Step 1 . 

b) Then for each edge failure / in turn, generate restoration routes for all demands 
affected by / so as to minimize the cost of the implied increase in edge capacities. 

For a single edge failure /, the path problem of generating restoration routes (Step 2) 
can be represented by an integer min-cost multicommodity flow problem, with a com- 
modity for each demand group affected by /. Define the restoration flow for commodity 
d on an edge e to be the amount that the flow of d on e exceeds that of the service routing. 
The cost of a unit of restoration flow crossing an edge e is 0 for the first c units of flow, 
where c is the current capacity of e, and the edge cost of e for further units of flow. 

There are a number of exact and approximate algorithms for min-cost multicom- 
modity flow. We chose a three-stage implementation, and found it to work very well in 
practice (see Section 6). However, there is room for improvement in the running time 
of our implementation, and it is worth pursuing other approaches to solving the flow 
problem, for example combinatorial approximation algorithms or linear program with a 
column generation formulation. 

Our implementation uses three steps to solve the min-cost multicommodity flow 
problem, in increasing order of running time and power: 

1 . Greedy. Attempt to route each demand in turn, using an integer capacitated max-flow 
subroutine and decrementing the available edge capacities after each demand. If all 
demands are routed within the current edge capacities, we have found a zero-cost 
solution to the multicommodity flow problem. Repeat this process for a number of 
random permutations of the demands. 

2. Short Multicommodity Flow. If the greedy routing fails, construct the following 
aggregated and constrained version of the multicommodity flow problem. 

a) Solve a capacitated flow problem to generate a flow of the appropriate size bet- 
ween the endpoints of the failed edge (similar to the flows in lowerbound-LP). 

b) Break each demand d into three segments d\,d 2 ,d^, with d\ and d^ maximal 
such that none of their internal nodes are reached by the flow from 2a. 

c) Create a graph h containing the union of the edges in all d 2 and those containing 
flow from 2a. 

d) Aggregate demands between the same endpoints of h, and solve the reduced 
multicommodity flow problem. 

e) If a zero-cost solution is found, then combine the resulting restoration paths with 
the appropriate di and ds to produce a zero-cost solution to the large problem. 

3. Long Multicommodity Flow. If the short multicommodity flow does not produce a 
zero-cost solution, solve the full multicommodity flow. 



To solve the short and long multicommodity flows, our implementation generates 
flow-based formulations using AMPL and solves them using an interior algorithm in 
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CPLEX. We find empirically that the greedy step is successful for about 90% of the 
edges, and the short multicommodity flow has a zero-cost solution in the majority of the 
remaining cases. Thus the number of long multicommodity flow problems that must be 
solved is very small, under 5% of the number of edges. 

6 Experimental Results 

This section describes an experimental evaluation of algorithm LBAlg. The data sets 
(also used in [4]) are for four two-connected networks, sized as follows: 




The unit capacity cost for an edge is 300 plus the edge length in miles. The constant 
300 was chosen to represent the relative cost of equipment placed only at the end of a 
link, such as switch ports, compared to equipment which is placed at regular intervals 
along a link, such as optical amplifiers. Edge lengths ranged from 0.1 to 992 (measured 
in miles), giving edge costs in the range 300.1 to 1292. 

For each network, an actual matrix of traffic forecasts between approximately 600 
US cities and towns was first mapped to the network nodes, then concentrated into large 
units of bandwidth, resulting in problems of moderate size, with unit demand sizes. Two 
different forecasts were used for each of A, B and C, and one for D, resulting in the 
following demand sets: 



demand set 


number of demands 


A 1 


178 


A2 


258 


B 1 


260 


B2 


465 


C2 


679 


D 1 


2120 



The algorithm LBAlg was run on these demands sets, giving the following results. 
The table shows the cost of the serviee routing, the cost of the extra capacity required 
for restoration in the lower bound and for LBAlg, the ratio of the cost of restoration 
capacity required by LBAlg and the lower bound, and lastly the ratio of the total cost 
(service plus restoration) of LBAlg compared to the lower bound. 



demand set LB LBAlg LBAlg / LB LB Runtime Extraction Runtime 



A 1 


971636 975339 


1.003 


30 


3:01 


A2 


1043934 1046977 


1.003 


29 


3:55 


B 1 


1139824 1147916 


1.007 


1:02 


3:28 


B2 


1367741 1371754 


1.003 


1:04 


7:13 


C 1 


1245997 1257945 


1.006 


28:15 


9:32 


C2 


1656629 1667173 


1.006 


27:33 


35:04 


D 1 


2271209 2291555 


1.009 


1:15:32 


4:04:46 




112 



S.J. Phillips and J.R. Westbrook 



As can be seen, the restoration plans produced by LBAlg are within 1 % of the lower 
hound, and therefore within 1% of optimal. This is far better than Theorem 4 would 
suggest. There is some opportunity to reduce the running time of the extraction process 
hy using a more efficient combinatorial algorithm for exact or approximate multicom- 
modity flow, or by using column generation instead of a flow-based representation of 
the multicommodity flow linear program. 

For comparison, the next table (extracted from [4]) presents the performance of 
the column generation algorithm of [3] on the three smaller networks. This algorithm 
exactly solves a linear relaxation of the restoration capacity planning problem, then 
applies some heuristics to generate an integer solution. Network D is too large for the 
column generation approach to handle. 



demand set 


Col Gen / LB Col Gen runtime (minutes) 


A 1 


1.010 


38:00 


A2 


1.010 


53:00 


B 1 


1.007 


2:06:00 


B2 


1.003 


4:08:00 


C 1 


1.004 


23:43:00 


C2 


1.005 


59:33:00 



The restoration plans produced by LBAlg are as good as those produced by column 
generation, while the large reduction in running time enables the solution of much larger 
problems than was previously possible. 



7 Combined Optimization of Service and Restoration 

In this section we consider the problem of building capacity to handle both service and 
restoration at minimum total cost. That is, we compute service and restoration paths 
together. 

The linear program of Section 3 can be extended without much difficulty to handle the 
more general case. The idea is to add a service multicommodity flow to the collection of 
restoration multicommodity flows. The service flow constants in large-LP are replaced 
by service flow variables. 

The linear program is shown in Figure 4. A demand group now contains all demands 
between a given pair of nodes. Applying an analysis similar to that of section 3 gives the 
following theorem. 

Theorem 5. Suppose that the serviee capacity of each edge is at least ^ log 2m. Then 
with probability at least 1/2 the cost of the service and restoration routes produced by 
the above algorithm is at most 1 + t times that of the optimal cost. 

The above LP uses 0{rrf\G\) flow variables. We can use the lower-bound algorithm 
to get a (4 + e) -approximate solution with many fewer variables. The idea is simple: 
for the service routes, simply use shortest paths in the network according to the cost per 
unit capacity values. Then input these service routes into the lower-bound algorithm. 
The bound on the approximation ratio follows from the following lemma. 




Approximation Algorithms for Restoration Capacity Planning 



113 



Constants: 



supply(§^, v) Supply for demand group g at node v 

cost(e) Unit cost of provisioning capacity on edge e 

Variables, all non-negative: 

sflow(g, u, v) Service flow of demand group g from m to v on e = {«, v} 
rflow(g, /, u, v) Flow of group g on edge e= {u,v} from u to v, under failure / 
restcap(g,/, M, v) Restoration capacity on e = {m, v} required for group g under failure / 



cap(e) Total service and restoration capacity of edge e 
Minimize XfCost(e) • cap(e) subject to: 

rflow(,g,/,M,v) = rflow(g,/,v,M) = 0 \/{u,v} = f e F,g e G (10) 

restcap(,g,/,M,v) > rflow(g,/,M, v) - sflow(g,M,v) VgeGJeF,u,v \ {u,v} £^11) 

cap(e= {m,v}) > ^(sflow(g,M,v)-hsflow(g,v,i<)) (12) 

-h^(restcap(g,/,M,v)-hrestcap(g,/,v,M))Ve eEJeF (13) 

Y, sQow{g,x,v) = Y sflow(g,v,x)-h supply (g,v) 'if€F,g€G (14) 

Y = Y rflow(g,/,r;,x)-hsupply(g,v) VfGF,g€G (15) 

j»::{ji:,v}g£' ji::{ji:,v}g£' 

(16) 



Fig. 4. The linear program gen-LP 



Lemma 6. Let OPT denote the optimum cost of a solution for the serviee and restoration 
planning problem. There is a solution in which the service routingfollows shortest paths 
(according to cost per unit capacity) with total cost at most 2 • OPT. 

Proof. Let O be an optimum solution for the service and restoration planning problem. 
We will construct a solution to the restricted problem, at most doubling the cost. 

Consider a demand d and a fault / on the shortest path between the endpoints of d. 
The demand d is affected by / in the restricted problem, so we must specify a restoration 
route for it. 

If d is affected by / in O, then we use the restoration path for d on failure / in (T as 
the restoration path for the restricted problem. Otherwise, the service route for d m O 
does not contain /, so we use it as the restoration path for the restricted problem. 

The cost of the restoration capacity for the restricted problem is at most the sum of 
the service and restoration costs of O. Furthermore, the service cost of the restricted 
problem is at most the service cost of O, so the total cost of the restricted problem is at 
most double that of O. 

The bound of Lemma 6 is tight, as can be seen from the following example. Let the 
network consist of A: + 1 parallel edges from s to t and the demands consist of k unit 
demands from s to t. Let one edge have cost 1 — e and the others have cost 1. Then in the 




114 



S.J. Phillips and J.R. Westbrook 



shortest paths solution, all traffic routes over the edge of weight 1 — e, necessitating a 
total of k units of restoration on other edges. An optimal solution routes 1 unit of demand 
on each of the k edges, necessitating only a single unit of restoration capacity on the 
A: + 1st edge. As k grows the cost ratio tends to 2. 

Theorem 7. Assume the minimum service capacity under shortest-paths routing is at 
least 1 /e. Then algorithm LBAlg is a (4 + c) -approximation algorithm under the total- 
cost measure. 

Proof. The proof follows by combining theorem 4 with lemma 6. 

8 Remarks 

Theorem 7 can be strengthened to a 2 + t-approximation guarantee by a lower bound 
LP that more carefully combines a service flow calculation with restoration capacity 
planning. 

As stated in the introduction, our model limits restoration plans in certain ways that 
correspond to standard practice in network planning. Our methods can be adapted to 
handle variations in the model. For example, we can allow service routes to change 
upon an edge failure even if the old route is unaffected by the failure, or we can allow 
a restoration path for one demand to use service capacity belong to another demand, if 
the other demand has also been rerouted onto a service route. 

The speciflcs of the model sometimes allow us to reduce the number of variables in the 
(1 + e) -approximate flow formulation. We are also generally able to achieve a constant- 
approximate solution for these variations, for a small constant, with 0{m?) variables, 
using the lower-bound approach. Details are omitted from this extended abstract. 
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Abstract. Given a bounded integer program with n variables and m constraints 
each with 2 variables we present an 0{mU) time and 0(m) space feasibility al- 
gorithm for such integer programs (where U is the maximal variable range size). 
We show that with the same complexity we can find an optimal solution for the 
positively weighted minimization problem for monotone systems. Using the local- 
ratio technique we develop an Oinmll) time and 0(m) space 2-approximation 
algorithm for the positively weighted minimization problem for the general case. 
We further generalize all results to non linear constraints (called axis-convex con- 
straints) and to non linear (but monotone) weight functions. 

Our algorithms are not only better in complexity than other known algorithms, but 
they are also considerably simpler, and contribute to the understanding of these 
very fundamental problems. 



Keywords: Combinatorial Optimization, Integer Programming, Approximation Algo- 
rithm, Local Ratio Technique, 2SAT, Vertex Cover. 

1 Introduction 

This paper is motivated by a recent paper of Hochbaum, Megiddo, Naor and Tamir 
[10], which discusses integer programs with two variables per constraint. The problem 
is defined as follows: 

(2VIP) min WiXi 

s.t. akXi^ + bkXj^ > Ck V/c G {1, . . . , m} 

< Xi < Ui Vi G {1, ... , n} 

where 1 < ik,jk <n,Wi> 0, a,b,c e and u G IN”. 

Obviously this problem is a generalization of the well known minimum weight vertex 
cover problem (VC) and the minimum weight 2 satisfiability problem (2SAT). Both 
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problems are known to be NP-hard [6] and the best known approximation ratio for VC 
[3,9,1 1] and 2SAT [8] is 2. Both results are best viewed via the local ratio technique (see 
[2,4]). 

Hochbaum et al. [10] presented a 2-approximation algorithm for the 2VIP problem. 
Their algorithm uses a maximum flow algorithm, therefore the time complexity of their 
algorithm is relatively high, i.e., when using Goldberg and Tarjan’s maximum flow 
algorithm [7] it is 0{nmU‘^ log(^^)), where U = max^ {m — £ij. By using the local- 
ratio technique we present a more natural and simpler 0{nmU) time and 0{m) space 
2-approximation algorithm. 

In order to develop an approximation algorithm it seems natural to first study the 
feasibility problem. Indeed this is done by Hochbaum et al. [10] for the 2VIP problem 
and by Gusfleld and Pitt [8] for the 2SAT problem. In Sect. 2 we present our 0{mU) 
time and 0{m) space feasibility algorithm for 2VIP systems. Section 3 includes the 
2-approximation algorithm for linear integer systems. In Sect. 4' we show that the 
feasibility algorithm and the approximation algorithm presented in this paper can be 
generalized to some non-linear systems with the same time and space complexity. We 
define a generalization of linear inequalities, called axis-convex constraints, and show 
that the algorithms can be generalized to work with such constraints. We also generalize 
the 2-approximation algorithm to objective functions of the form where 

all the Wi ’s are monotone weight functions. The optimality algorithm for monotone linear 
systems appears in Sect. 5 * . We show that this algorithm can work with some non-linear 
constraints, and we generalize the algorithm to monotone weight functions, as well. 

Table 1 summarizes our results for 2VIP systems. 



Table 1. Summary of Results 



Problem 


Previous results (time,space) 


Our results (time,space) 


2SAT 

Feasibility 


0(m), 0(m) 

Even, Itai and Shamir [5] 




2VIP 

Feasibility 


0(mU), 0(mU) 

by using reduction to 2SAT [10] 


0(mU), 0{m) 


2SAT 

2-approximation 


0(nm), 0(n‘‘ + m) 

Gusfield and Pitt [8] 


0(nm), 0(m) 


2VIP 

2-approximation 


0{nmU^ \og{n^U/m)), 0{mU) 
Hochbaum, Megiddo, Naor 
and Tamir [10] 


0(mnU), 0(m) 


Monotone 2VIP 
Optimization' 


0{mU), OijnU) 
by using reduction to 2SAT 


0(mU), 0{m) 



1 



Omitted from this extended abstract. 
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2 Feasibility Algorithm 

Given a 2VIP system we are interested in developing an algorithm which finds a feasible 
solution if one exists. Since the special case when f = 0"' and u = 1" is the known 
2SAT feasibility problem, it is natural to try to extend the well known 0{m) time and 
0{m) space algorithm of Even et al. [5]. It is possible to transform the given 2VIP 
system to an equivalent 2SAT instance with nU variables and (m + n)U constraints 
(this transformation by Feder appears in [10]). By combining this transformation with 
the linear time and space algorithm of Even et al. we get an 0{mU) time and 0{mU) 
space feasibility algorithm. In this section we present an 0{mU) time and 0{m) space 
feasibility algorithm which generalizes the algorithm by Even et al. 

The main idea of the algorithm of Even et al. is as follows : we choose a variable Xi and 
discover the force values for other variables by assigning = 0 and by assigning = 1. 
If one of these assignments does not lead to a contradiction, we can assign Xi this value 
and make the corresponding forced assignments. The correctness of their approach is 
achieved by proving that a non contradictory assignment preserve the feasibility property. 
The efficiency of their algorithm is achieved by discovering the forced assignments of 
Xi = 0 and those of Xi = 1 in parallel. 

The purpo se of this section is not only to show the factor fl{U) improvement in space 
complexity but also to put the foundations for the 2-approximation algorithm presented 
in the next section. 

Definition 1. For a given 2VIP instance 

sat{£, u) = {x : £ < X < u and x satisfies all 2VIP eonstraints} . 

Definition 2. For a given constraint k on the variables Xi, Xj 

constraint{k) = {(a, f3) : Xi = a,Xj = [3 satisfy constraint k} . 

Definition 3. Given a, (3 a 2Z we define [a, [3] = {z a 2Z : a < z < /?}. 

For a constraint k on the variables Xi and Xj : 

Observation 4. If {a,(3),{a,'j) e constraint{k) then {a, 6) e constraint{k) for all 

Observation 5. If{ai,a 2 ), ( 71 , 72 ) € constraint{k) thenall points insidethe 

triangle induced by (ai , a 2 ) , {fi , /? 2 ) and ( 71 , 72 ) satisfy constraint k. 

We present a routine in Fig. 1 which will be repeatedly used for constraint propaga- 
tion^. It receives as input two arrays £ and u of size n (passed by reference), two variables 
indices i,j and a constraint index k on these two variables. The objective of this routine 
is to find the impact of constraint k and the bounds £i , Ui on the bounds £j , Uj . 

We denote by and the values of £ and u after calling OneOnOneImpact 
(£, u, i, j, k). Hence we get: 

^ Constraint propagation was used for the LP version of the problem, e.g., see [1]. 
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Routine OneOnOneImpact(f, u, i,j, k) 

Let constraint k be axi + bxj > c. 

If b > 0 then 

ifa>Othenf' ^ 
elsef'^f^] 

else 

if a > 0 then w'- ^ 
eisert' ^ 
tj ^ max | 

Uj -h- min } 



Fig. 1. Routine OneOnOneImpact . 



Observation 6. = sat{i, u). 

Observation 7. IfjS e there exists a e ]sueh that {cx,j3) e 

constraint{k). 

The routine in Fig. 2, which is called OneOnOneImpact, receives as input two arrays 
i and u of size n (passed by reference) and a variable index t, and change £ and u 
according to the impact of it and Ut on all the intervals. 



Routine OneOnAllImpact(f, u, t) 

Stack -t- {t} 

While Stack 0 do 
i ^ POP(&flc^) 

For each constraint k involving Xi and another variable Xj 
OneOnOneImpactii, u, i,j, k) 

If Uj < tj then return “faif’ 

If ij or Uj changed then PUSH j into Stack 



Fig. 2. Routine OneOnAllImpact. 

We now prove that we do not lose feasible solutions after activating OneOnAllImpact. 



Lemma 8. Ifi‘^ft^^ and are the values of I and u after calling OneOnAllImpact 
then = sat{i, u). 



Proof. All changes made to i and u are done by routine OneOnOneImpact. It is easy to 
prove the lemma by induction using Observation 6. □ 
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Lemma 9. ^OneOnAllImpact(£, u, t) terminates without failure with the bounds ^ 

yofter sat{{£i, £t-i, -oo, £t+i, • • • , £n), (wi, . . . , ut-i, oo, ut+i, Un)) f 



0 

then sat{£^ft<^^ t 



Proof Lei y e sat{{£i, . . . ,£t-i,~oo,£t+i, ■ ■ ■ ,£n), (mi, . . . , ut-i, oo, itt+i, . . . , 
We define a vector as: 




yt, 

yt < £f^^ 
yt > nfter 



Consider constraint k on xt and Xj.We need to show that y'-, y'- e constraint(/c). 

Case 1: y, e andy, e [^f^er^ ^afterj^ 

iyi,yj) = iyi,yj) e constraint(A:). 

Case 2: yt < and yj e 

y is a feasible solution, thus {yi,yj) € constraint (/c). When we changed the lo- 
wer bound of Xi to we called OneOnOneImpact for all constraints involving 
Xi including constraint k. By Observation 7 there exists a € for 

which (a,yj) e constraint(A:). Thus by Observation 4 we get that {£f^^^^,yj) € 
constraint(A:). 

Case 3: yt < and yj < . 

y is a feasible solution, thus {yt, yf € constraint(/c). When we changed the lower 
bound of Xi to we called OneOnOneImpact for all constraint involving Xi 
including constraint k. By Observation 7 there exists a € for which 

{a, € constraint(/c). From the same arguments we get that there exists /? € 

j^after^ ^afterj (^fafier: ^ ^ constraint(/c) as well. Thus by Observation 5 

we get that e constraint(A:). 

Other cases are similar to Cases 2 and 3. □ 



The algorithm in Fig. 3 returns a feasible solution if such a solution exists. 

Theorem 10. Algorithm Feasibility returns a feasible solution if such a solution exists. 

Proof. Each recursive call reduces at least one of the ranges (the t ’th), thus the algorithm 
must terminate. By Lemma 8 if sat(f, u) = 0 the algorithm returns “fail”. On the other 
hand, if sat(f, u) 7 ^ 0 we prove by induction on algorithm finds 

a feasible solution. 

Base: X)r=i(w ~ £i) = 0 implies £ = u, thus a; = is a feasible solution. 

Step: By Lemma 8 at least one of the calls to OneOnAllImpact terminates without 

failure. If call left was chosen then by Lemma 9 we know that sat(f^®^’^, f 0, 
therefore by the induction hypothesis we can find a feasible solution for 
Obviously a feasible solution x G sat(f^®^'^, satisfies x G sat(f, u). The same 
goes for call right. 
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Algorithm Feasibility u € ^") 

lit = u then 

If at = f is a feasible solution then return i 
else return “faif 

Choose a variable Xt, for which It < ut 

a ^ [I (ft + Mt)J /+ An arbitrary value a 6 [4, ut — 1] suffices as well +/ 

^ (f, (ui, . . . , Wt-l, a,ut+i, • • • , Wn)) 

^ ((fi, . . . ,ft-i, a+ • • . ,f„),u) 

Call OneOnAllImpactij^^^^ , t) and OneOnAllImpactif^^^ , t) 

If both calls fail then return “faif’ 

Choose a successful run of OneOnAllImpact 
If call left was chosen 

then return Feasibility{£^^^^ , 
else return Feasibility{f^^^ , 



Fig. 3. Feasibility Algorithm 



This concludes the proof. 



□ 



Theorem 11. Algorithm Feasibility can be implemented in time 0{mU) and space 
0{m). 

Proof. To achieve time complexity of 0{mU), we run both calls to OneOnAllImpact 
in parallel (this approach was used for 2SAT by Even et al. [5]), and prefer the faster 
option of the two, if a choice exists. After every change in the range of a variable 
Xi, we need to check the nii constraints involving this variable, in order to discover 
the impact of the change. To perform this task efficiently we can store the input in 
an incidence list, where every variable has its constraints list. As Xi can be changed 
up to {ui — £i) times, we eonclude that the total time complexity of the changes is 
ntfui — ii)) = 0{mU) (the time wasted on unfinished trials is bounded by 
the time complexity of the chosen trials). The algorithm uses 0{m) space for the input 
and a constant number of arrays of size n, thus uses linear space. □ 



3 From Feasibility to Approximation 

Before presenting our approximation algorithm, let us first discuss the special case where 
|C/| = 2 which is the minimum 2SAT problem. The approach of Gusfield and Pitt [8] can 
be viewed as follows. The 2CNF formula can be presented as a digraph where each vertex 
represent a Boolean variable or its negation, and an edge represent an OneOnOneImpact 
propagation (logical A propagation of an assignment can be viewed as a traversal 

(e.g.,BFS,DFS) in the digraph. In order to get the OneOnAllImpact mschamsm, Gusfield 
and Pitt’s algorithm starts with a preprocess of constructing a transitive closure. This 
preprocess uses 0{n? ) extra memory, which is expensive. It is much more critical when 
we try to generalize Gusfield and Pitt’s algorithm to 2VIP, in this case the preprocess 
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uses J7(n^C/^) extra memory. As far as we know, every algorithm which relies upon 
direct 2SAT transformation suffers from this drawback. 

We present an 0{nmU) time and 0{m) space 2-approximation algorithm, which is 
a specific implementation of our feasibility algorithm. Not only does this seem natural, 
but also its complexity of 0{nm) time and 0{m) space in the case of 2SAT dominates 
that of Gusfield and Pitt’s 2SAT algorithm. 

In order to use the local-ratio technique [2] we extend the problem definition. Given 
£,u <E IN'" and £,u £ IR” for which £<£<u<uwe define the following Extended 
2VIP problem: 



(E2VIP) minX]r=i ^{xi,£i,Ui)wi 

s.t. + bkXj^ > Ck V/c G {1, . . . , m} 
Xi € [£i,Ui] Vi G n} 



where 



A{x,£i,Ui) 



{Xi f'i)) Xi G 

{ill £i) -I Xi ^ Hi 

0 , Xi ^ £i 



and 1 < ik,jk < n, u>i > 0, a, 6, c G and £,u e IN’". 

We define W{x,£,u) = A{xi,£i,iii)wi. A feasible solution x* is called 

an optimal solution if for every feasible solution x\ W{x* , £,u) < W{x, £,ii). We 
define W*{£,u) = W{x*,£,u). A feasible solution x is called an r -approximation if 
W{x,£,u) < r ■ W*{£,u). 



Observation 12. Given £,u,m £ for which £< m< u we get 



W (x, £,u) = W (x, i,m) + W {x, in, u) 



Similarly to the Decomposition Observation from [2] we have: 

Observation 13. (Decomposition Observation^ 

Given £,u,m £ IR*" such that £ < in <u then 

W\£,ih) + W\in,u)<W*{£,u) . 



Proof. We denote by x* an optimal solution for the system with regard to £, in, by y* an 
optimal solution with regard to in, u, and by z* an optimal solution with regard to £, u. 

W* {£, in) + W* {in, u) = W {x* ,£,ih) + W {y* ,ih,u)) [By definition] 

< W{z* ,£, in) + W{z* , in, u)) [Optimality of x* ,y*] 

< W{z*,£,u) [Observation 12] 

= W* {£, u) [By definition] 

□ 



The following is this paper’s version of the Local-Ratio Theorem (see [2,4]): 

Theorem 14. If x is an r -approximation with regard to £, in and an r -approximation 
with regard to in, u then x is an r -approximation with regard to £, u. 
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Proof. 

W {x, i,u) = W {x, £,m) + W {x, m, u) [Observation 12] 

< r ■ W* {£, m) + r ■ W* {£, u) [Given] 

< r ■ W* (£, u) [Deeomposition Observation] 

Deflnition 15. Given a,b e M"" we define □ 

max {a, h] = {max (oi, 6i} , . . . , max {an, bn}} 
min {a, 6} = {min (oi, 6i} , . . . , min (a„, &„}} . 

We are ready to present the 2-approximation algorithm - see Fig. 4. 



Algorithm Approximate^, u € IN"; £,u E IR") 



lf£f£ then return Approximate^, u, max {£,i} ,u) 

Ifufu then return Approximate{£, u, t, min {m, m}) 
lf£ = u then 

If at = f is a feasible solution 
then return £ 
else return “faif’ 

Choose a variable xt, for which £t < ut 
(X < — {{ {£t + at)J 

^ {£, (wi , . . . ,ut-i,a,ut+i, . u„)) 

(f''ight, ^ {{£l, • • • , ft-l, ft + 1, ft-H, ■ ■ ■ , £n) , U) 

Call OneOnAllImpactif^^^^ , t) and OneOnAllImpact{£^^^ , t) 

If both calls failed then return “faif 

If call right failed then return Approximate{£^^^^ , f, it) 

If call left failed then return Approximate £, u) 
lfW{£^^^^,£,u) < W{£'^^\i,u) 
then 

Find m 6 IR" such that: 

£ <rh < max l| and W £,rh) = W £, u) 

rh E- max | m, f | 

Return Approximate m, u) 

else 



Find m 6 IR" such that: 
£ <rh < max 

m ^ max I m 



I and W{£^^^\£, m) = £, u) 

'fright I 

Return Approximate (f , rh,u) 



Fig. 4. 2-Approximation Algorithm 
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Observation 16. ^/g-onYAw Approximate is a specific implementation of Algorithm Fea- 
sibility. 



Theorem 17. Algorithm Approximate is a 2-approximation algorithm for 
E2VIP systems. 



Proof. By Observation 16 when Algorithm Approximate returns a solution it is a fea- 
sible solution. If sat(f, u) 0 we prove by induction that the algorithm finds a 2- 
approximation. 

Base: u = £ implies W {£, £,u) = 0 
Step: There are several cases: 

Casel: i^£: 

A 2-approximation solution with respect to £ is obviously a 2-approximation 
solution with respect to max | f , £ | . 

case 2: u fu: 

Trivial. 

Case 3: Call right failed: 

By Lemma 8 there is no feasible solution which satisfies Xt > <x + 1, therefore 
we do not change the problem by adding the constraint Xy < a. By Lemma 8 
calling OneOnAllImpact{£^^^^ , t) does not change the problem, as well. 
Case 4: Call left failed: 

Similar to Case 3. 

Case 5: Both calls succeeded and W £,u) <W £, u) : 

We first show that every feasible solution is a 2-approximation with regard to £ 
and m. We examine an optimal solution x*.lfx* > ^left 

m > implies 

W{x* ,£, m) > W{£^^^^,£, rh). If x* > then by the definition of m we 

get that > W{£^^^^,£,u) > W{£^^^^,£,m). 

On the other hand, W {m, £,u) < 2 • FL £, rh), therefore W {x, £, rh) < 
2 • ,£,rh) for every feasible solution x. Therefore by Theorem 14 a 

2-approximation with regard to rh and u is a 2 -approximation with regard to £ 
and u. 

We need to show that there exists an optimal solution x* for which x’^ < a. For 
every feasible solution y such that > a + 1 we define y' as: 



«left 






Vi e [I 
Vi < £, 



left 






By Lemma 9 y' is a feasible solution. ^left ^ Yh implies W{y',rh,u) < 
W (y, m, u), thus there is an optimal solution with regard to rh and u within the 
hounds Therefore a 2-approximation within the bounds 

is a 2-approximation with regard to rh and u. 
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Case 6: Both calls succeeded and W £,u) > W £, u) : 
Similar to Case 5. 



□ 



Corollary 18. Algorithm Approximate is a 2-approximation algorithm for 2VIP sy- 
stems. 



Theorem 19. ^/gonY/tm Approximate can be implemented in time 0{nmU) and space 
0{m). 

Proof. In order to get the required time complexity, we must choose the Xi ’s wisely. 
One possibility is to choose the variables in an increasing order, i.e. X\,X 2 ,..., Xn, and 
to restart again from the beginning after reaching Xn- We call such n iterations on all 
n variables a pass. As stated before, changing the range of Xi might cause changes in 
the ranges of other variables. The existence of a constraint on Xi and another variable 
Xj makes Xj a candidate for a range update. This means that we have to check the 
nii constraints involving Xi, to discover the consequences of changing its range each 
time this range changes. Xi can be changed up to Ui — £i times, therefore we get that 
the time complexity of a single iteration is 0{mU + n) = 0{mU). One pass may 
involve all n variables, so the time complexity of one pass is 0{nmU). By choosing 
we reduce the possible range for xt at least by half. Therefore in 
a single pass we reduce the possible ranges for all variables at least by half. Thus we 
get that the total time complexity is: 0{mn^) = 0{mnU). As before, the 

algorithm uses an incidence list data structure, thus uses linear space. □ 
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Abstract. In network scheduling a set of jobs must be scheduled on unrelated 
parallel processors or maehines which are conneeted by a network. Initially, each 
job is located on some maehine in the network and cannot be started on another 
machine until sufficient time elapses to allow the job to be transmitted there. This 
setting has applications, e. g., in distributed multi-processor computing environ- 
ments and also in operations research; it can be modeled by a standard parallel 
maehine environment with machine-dependent release dates. We consider the ob- 
jeetive of minimizing the total weighted completion time. 

The main contribution of this paper is a provably good convex quadratic pro- 
gramming relaxation of strongly polynomial size for this problem. Until now, 
only linear programming relaxations in time- or interval-indexed variables have 
been studied. Those LP relaxations, however, suffer from a huge number of vari- 
ables. In particular, the best previously known relaxation is of exponential size 
and ean therefore not be solved exactly in polynomial time. As a result of the 
convex quadratic programming approach we can give a very simple and easy to 
analyze randomized 2-approximation algorithm whieh slightly improves upon 
the best previously known approximation result. Furthermore, we consider pre- 
emptive variants of network seheduling and derive approximation results and re- 
sults on the power of preemption which improve upon the best previously known 
results for these settings. 



1 Introduction 

We study the following parallel machine scheduling problem. A set J of n jobs has to 
be scheduled on m unrelated parallel machines which are connected by a network. The 
jobs continually arrive over time and each job originates at some node of the network. 
Therefore, before a job can be processed on another machine, it must take the time to 
travel there through the network. This is modeled by machine-dependent release dates 
Tij > 0 which denote the earliest point in time when job j may be processed on machine 
i. Together with each job j we are given its positive processing requirement which also 
depends on the machine i job j will be processed on and is therefore denoted by pij. 
Each job j must be processed for the respective amount of time without interruption on 
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one of the m machines, and may be assigned to any of them. However, for a given job 
j it may happen that pij = oo for some (but not all) machines i such that job j cannot 
be scheduled on those machines. Every machine can process at most one job at a time. 
This network scheduling model has been introduced in [4, 1]. 

We denote the completion time of job j by Cj. The goal is to minimize the total 
weighted completion time: a weight wj > 0 is associated with each job j and we seek 
to minimize In scheduling, it is quite convenient to refer to the respective 

problems using the standard classification scheme of Graham, Lawler, Lenstra, and 
Rinnooy Kan [7]. The problem R|ry [Xu’yCy, just described, is strongly NP-hard, even 
for the special case of two identical parallel machines without nontrivial release dates, 
see [2, 12]. 

Since we cannot hope to be able to compute optimal schedules in polynomial time, 
we are interested in how close one can approach the optimum in polynomial time. A 
(randomized) a-approximation algorithm computes in polynomial time a feasible solu- 
tion to the problem under consideration whose (expected) value is bounded by a times 
the value of an optimal solution; a is called the performance guarantee or performance 
ratio of the algorithm. All randomized approximation algorithms that we discuss or 
present can be derandomized by standard methods; therefore we will not go into the 
details of derandomization. 

The first approximation result for the scheduling problem R was ob- 

tained by Phillips, Stein, and Wein [15] who gave an algorithm with performance 
guarantee <9(log^«). The first constant factor approximation was developed by Hall, 
Shmoys, and Wein [9] (see also [8]) whose algorithm achieves performance ratio 
Generalizing a single machine approximation algorithm of Goemans [6], this result 
was then improved by Schulz and Skutella [18] to a (2 + e) -approximation algorithm. 
All those approximation results rely somehow on (integer) linear programming for- 
mulations or relaxations in time-indexed variables. In the following discussion we as- 
sume that all processing times and release dates are integral; furthermore, we define 
Pmax ■= maXijPij. 

Phillips, Stein, and Wein modeled the network scheduling problem as a hypergraph 
matching problem by matching each job j to pij consecutive time intervals of length 1 
on a machine i. The underlying graph contains a node for each job and each pair formed 
by a machine and a time interval [t,t+ 1 ) where t is integral and can achieve values in 
a range of size npmax- Therefore, since y^max may be exponential in the input size, the 
corresponding integer linear program contains exponentially many variables as well as 
exponentially many constraints. Phillips et al. eluded this problem by partitioning the 
set of jobs into groups such that the jobs in each group can be scaled down to polynomial 
size. However, this complicates both the design and the analysis of their approximation 
algorithm. 

The result of Hall, Shmoys, and Wein is based on a polynomial variant of time- 
indexed formulations which they called interval-indexed. The basic idea is to replace 
the intervals of length 1 by time intervals [2*^,2*^+') of geometrically increasing size. 
The decision variables in the resulting linear programming relaxation then indicate on 
which machine and in which time interval a given job completes. Notice, however, that 
one looses already at least a factor of 2 in this formulation since the interval-indexed 
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variables do not allow a higher precision for the completion times of jobs. The approx- 
imation algorithm of Hall et al. relies on Shmoys and Tardos’ rounding technique for 
the generalized assignment problem [20] . 

Schulz and Skutella generalized an LP relaxation in time-indexed variables that 
was introduced by Dyer and Wolsey [5] for the eorresponding single machine schedul- 
ing problem. It contains a decision variable for eaeh triple formed by a job, a machine, 
and a time interval [t,t+ 1) which indicates whether the job is being proeessed in this 
time interval on the respeetive machine. The resulting LP relaxation is a 2-relaxation 
of the seheduling problem under consideration, i. e., the optimum LP value is within a 
factor 2 of the value of an optimal schedule. However, as the formulation of Phillips et 
ah, this relaxation suffers from an exponential number of variables and constraints. One 
can overcome this drawback by turning again to interval-indexed variables. However, 
in order to ensure a higher precision, Schulz and Skutella used time intervals of the 
form [(1 +e)*^,(l +e)*^+') where 8 > 0 can be chosen arbitrarily small; this leads to a 
(2 + e)-relaxation of polynomial size. Notice, however, that the size of the relaxation 
still depends substantially on />max and may be huge for small values of 8. The approxi- 
mation algorithm based on this LP relaxation uses a randomized rounding technique. 

For the problem of scheduling unrelated parallel machines in the absence of nontriv- 
ial release dates R | | Z the author has introduced a convex quadratic programming 
relaxation that leads to a simple ^-approximation algorithm [22]. One of the basic ob- 
servations for this result is that in the absence of nontrivial release dates the parallel 
machine problem can be reduced to an assignment problem of jobs to machines; for a 
given assignment of jobs to machines the sequencing of the assigned jobs can be done 
optimally on each machine i by applying Smith’s Ratio Rule [24]: schedule the jobs in 
order of nonincreasing ratios wj/ pij . Therefore, the problem can be formulated as an in- 
teger quadratic program in assignment variables. An appropriate relaxation of this pro- 
gram together with randomized rounding leads to the approximation result mentioned 
above. Independently, the same result has later also been derived by Jay Sethuraman 
and Mark S. Squillante [19]. 

Unfortunately, for the general network scheduling problem including release dates 
the situation is more complicated; for a given assignment of jobs to machines, the se- 
quencing problem on each machine is still strongly NP-hard, see [12]. However, we 
know that in an optimal schedule a ‘violation’ of Smith’s Ratio Rule can only occur 
after a new job has been released; in other words, whenever two successive jobs on ma- 
chine i can be exchanged without violating release dates, the job with the higher ratio 
Wj/ pij will be processed first in an optimal schedule. Therefore, the sequencing of jobs 
that are being processed between two successive release dates can be done optimally 
by Smith’s Ratio Rule. We make use of this insight by partitioning the processing on 
each machine i into n time slots which are essentially defined by the n release dates , 
j G T; since the sequencing of jobs in each time slot is easy, we have to solve an assign- 
ment problem of jobs to time slots and can apply similar ideas as in [22]. In particular, 
we derive a convex quadratic programming relaxation in rP'm assignment variables and 
0{nm) constraints. Randomized rounding based on an optimal solution to this relax- 
ation finally leads to a very simple and easy to analyze 2-approximation algorithm for 
network scheduling. 
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Our technique can be extended to network scheduling problems with preemptions. 
In preemptive scheduling, a job may repeatedly be interrupted and continued later on 
another (or the same) machine. In the context of network scheduling it is reasonable 
to assume that after a job has been interrupted on one machine, it cannot immediately 
be continued on another machine; it must again take the time to travel there through 
the network. We call the delay caused by such a transfer communication delay. In a 
similar context, communication delays between precedence constrained jobs have been 
studied, see, e. g., [14]. 

We give a 3-approximation algorithm for the problem R | , pmtn that, in 

fact, does not make use of preemptions but computes nonpreemptive schedules. There- 
fore, this approximation result also holds for preemptive network scheduling with ar- 
bitrary communication delays. Moreover, it also implies a bound on the power of pre- 
emption, i. e., one cannot gain more than a factor 3 by allowing preemptions. For the 
problem without nontrivial release dates K\pmtn\Y,WjCj, the same technique yields a 
2-approximation algorithm. For the preemptive scheduling problems without commu- 
nication delays, Phillips, Stein, and Wein [16] gave an (8 + e)-approximation. In [21] 
the author has achieved slightly worse results than those presented here, based on a 
time-indexed LP relaxation in the spirit of [1 8]. 

The paper is organized as follows. In the next section we introduce the concept of 
scheduling in time slots. We give an integer quadratic programming formulation of the 
network scheduling problem in Section 3 and show how it can be relaxed to a con- 
vex quadratic program. In Section 4 we present a simple 2-approximation algorithm 
and prove a bound on the quality of the convex quadratic programming relaxation. Fi- 
nally, in Section 5, we briefly sketch the results and techniques for preemptive network 
scheduling. 

Due to space limitations, we do not provide proofs in this extended abstract; we 
refer to the full paper [23] which combines [22] and the paper at hand and can be found 
on the authors homepage. 



2 Scheduling in time slots 



The main idea of our approach for the scheduling problem R | \ Y^WjCj is to somehow 

get rid of the release dates of jobs. We do this by partitioning time on each machine i 
into several time slots. Each job is being processed on one machine in one of its time 
slots and we make sure that job j can only be processed in a slot that starts after its 
release date. 

Let p,j < p ,2 < • • • < P(„ be an ordering of the release dates ru,] e /; moreover, we 
set pi^^i := oo. For a given feasible schedule we say that 4, the time slot on machine 
i, contains all jobs j that are started within the interval [pi*., P(j.^i ) on machine i; we 
denote this by j G ik- We may assume that there is no idle time between the processing 
of jobs in one time slot, i. e., all jobs in a slot are processed one after another without 
interruption. 

Moreover, as a consequence of Smith’s Ratio Rule we can restrict to schedules 
where the jobs in time slot 4 are sequenced in order of nonincreasing ratios Wj/pij. 
Throughout the paper we will use the following convention: whenever we apply Smith’s 
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Ratio Rule in a time slot on machine i and wj^jpui = wj/ pij for a pair of jobs j,k, the 
job with smaller index is scheduled first. For each machine z = 1 , . . . , tw we define a 
corresponding total order (/, -<,) on the set of jobs by setting j ~<i k if either wj/ pij > 
^k! Pik or Wj/ pij = Wk/ Pik and j < k. 



Lemma 1. In an optimal solution to the scheduling problem under consideration, the 
jobs in each time slot ik are seheduled without interruption in order of nondecreas- 
ing ratios wj/ pij. Furthermore, there exists an optimal solution where the jobs are 
sequenced according to -<; in each time slot ik- 

Notice that there may be several empty time slots This happens in particular if 
Ph ^ P't+i ■ Therefore it would be sufficient to introduce only qi time slots for machine 
r where qi is the number of different values j € J. For example, if there are no 
nontrivial release dates (i. e., = 0 for all r and j), we only need to introduce one time 

slot [0,oo) on each machine. The problem R| has been considered in [22]; for 

this special case our approach coincides with the one given there. 

Up to now we have described how a feasible schedule can be interpreted as a fea- 
sible assignment of jobs to time slots. We call an assignment feasible if each job j is 
being assigned to a time slot f with . On the other hand, for a given feasible 

assignment of the jobs in J to time slots we can easily construct a corresponding fea- 
sible schedule: Sequence the jobs in time slot ik according to and start it as early 
as possible after the jobs in the previous slot on machine i are finished but not be- 
fore pii^; in other words, the starting time 5,^, of time slot ik is given by Si^ := p,( and 
%+i :=max{p;^^,,5,-^-fXyg,.^;?y},forA:= 1. 

Lemma 2. Given its assignment of jobs to time slots, we can reconstruct an optimal 
schedule meeting the properties described in Lemma 1. 

We close this section with one final remark. Notice that several feasible assignments 
of jobs to time slots may lead to the same feasible schedule. Consider, e. g., an instance 
consisting of three jobs of unit length and unit weight that have to be scheduled on a 
single machine. Jobs 1 and 2 are released at time 0, while job 3 becomes available at 
time 1. We get an optimal schedule by processing the jobs without interruption in order 
of increasing numbers. This schedule corresponds to five different feasible assignments 
of jobs to time slots. We can assign job 1 to one of the first two slots, job 2 to the same 
or a later slot, and finally job 3 to slot 3. 



3 A convex quadratic programming relaxation 



As a consequence of Lemma 2 we have reduced the scheduling problem under consid- 
eration to finding an optimal assignment of jobs to time slots. Therefore we can give a 
formulation of R | | wjCj in assignment variables aiy G {0,1} where aiy = 1 if job 

j is being assigned to time slot ik, and aiy = 0 otherwise. This leads to the following 
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integer quadratic program: 



minimize 


j 

i,k 






subject to 


for all j 


(1) 




■^i'l = Pi'i 


for all i 


(2) 




Sik+i = max{p„^, , Si, + Y,ai,jPij} 
j 


for all i, k 


(3) 




Cj = +Pij + X 

hk j'Aij 


for all j 


(4) 






if Pk < rij 


(5) 




^ {0, 1} 


for all i, k, j 





Constraints (1) ensure that each job is being assigned to exactly one time slot. In con- 
straints (2) and (3) we set the starting times of the time slots as described in Section 2. 
If job j is being assigned to time slot ik, its completion time is the sum of the starting 
time Sij^ of this slot, its own processing time pij, and the processing times of other jobs 
/ ~<i j that are also scheduled in this time slot. The right hand side of (4) is the sum of 
these expressions over all time slots 4 weighted by aiy \ it is thus equal to the comple- 
tion time of j. Finally, constraints (5) ensure that no job is being processed before its 
release date. 

It follows from our considerations in Section 2 that we could replace (5) by the 
stronger constraint 



Qi^j — 0 



if p„ < Tij or p„ = p,,^j 



which reduces the number of available time slots on each machine. For the special case 
R I I JjwyCy this leads to the integer quadratic program that has been introduced in [22]. 
It is also shown there that it is still NP-hard to solve the continuous relaxation of this 
integer quadratic program; however, it can be solved in polynomial time if the term pij 
on the right hand side of (4) is replaced by Pij{l + 

Observe that this replacement does not affect the value of the integer quadratic pro- 
gram since the new term is equal to pij whenever aiy = 1 . This motivates the study of 
the following quadratic programming relaxation (QP) for the general problem includ- 
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ing release dates: 



minimize 

j 

subject to = 1 

i,k 



Sii — P([ 

{QP) = max{p;^^, , Si^ + Y,^iyPij} 

j 

Cj = (sh + Pij + X 

i.k ^ f^d 

Qi^j = 0 
Qiy > 0 



for all j 




for all i 


(6) 


for all i, k 


(7) 


for all j 


(8) 


P 

A 




for all i, k, j 





Notice that a solution to this program is uniquely determined by giving the values of 
the assignment variables aiy. In contrast to the case without nontrivial release dates, 
we cannot directly prove that this quadratic program is convex. Nevertheless, in the 
remaining part of this section we will show that it can be solved in polynomial time. 
The main idea is to show that one can restrict to solutions satisfying Sij^ — p,j, for all i 
and k. Adding these constraints to {QP) then leads to a convex quadratic program. 



Lemma 3. For all instances of R | | X wyC,- there exists an optimal solution to {QP) 

satisfying Si^. = ^^for all i and k. 

As a consequence of Lemma 3 we can replace the variables Si^. in {QP) by the 
constants p,j, by changing constraints (7) to 

Y,^kjPij < Ph+i - Ph for all i, k. 

j 

Furthermore, if we remove constraints (8) and replace Cj in the objective function by 
the right hand side of (8), we can reformulate the quadratic programming relaxation as 
follows: 

minimize b^a + ^a^Da (9) 

subject to X = 1 for all 7 (10) 

i,k 

{CQP) X'^h/T’b ^ Ph+i - Ph for alii. A: (11) 

j 

aty =0 if p,, < rij 

fl > 0 



2 

Here, a G IR""” denotes the vector consisting of all variables aiy lexicographically 
ordered with respect to the natural order 1 1 , 12 , . . . of the time slots and then, for 
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2 

each slot 4, the jobs ordered according to The vector b € IR™ is given hy = 
jWjPij + WjPii^, and D = is a symmetric mrp- x w«^-matrix given through 



'0 if 4 7^4,, 

_ I Wfpij if 4 = i'j,t and j f , 
i WjPijt if 4 = i'y and f -<i j , 

WjPij if4 = ^, and7=/. 

Because of the lexicographic order of the indices the matrix D is decomposed into mn 
diagonal blocks corresponding to the mn time slots. If we assume that the jobs are 
indexed according to and if we denote pij simply by pj, each block corresponding 
to a time slot on machine i has the following form; 




It has been observed in [22] that those matrices are positive semidefinite and therefore 
the whole matrix D is positive semidefinite. In particular, the objective function (9) is 
convex and the quadratic programming relaxation can be solved in polynomial time, 
see, e. g., [11, 3]. 

The convex quadratic programming relaxation (CQP) is in some sense similar to 
the linear programming relaxation in time-indexed variables that has been introduced 
in [ 1 8] . Without going into the details, we give a rough idea of the common underlying 
intuition of both relaxations: a job may be split into several parts (corresponding to frac- 
tional values Qiy in {CQP)) who can be scattered over the machines and over time. The 
completion time of a job in such a ‘fractional schedule’ is somehow related to its mean 
busy time; the mean busy time of a job is the average point in time at which its fractions 
are being processed (see (8) where Cj is set to the average over the terms in brackets on 
the right hand side weighted by aty ). However, in contrast to the time-indexed LP relax- 
ation, the construction of the convex quadratic program {CQP) contains more insights 
into the structure of an optimal schedule. As a result, {CQP) is of strongly polynomial 
size while the LP relaxation contains an exponential number of time-indexed variables 
and constraints. 



4 A simple 2-approximation algorithm 

The value of an optimal solution to the convex quadratic programming relaxation {CQP) 
of the last section is a lower bound on the value of an optimal schedule. Moreover, from 
the structure of an optimal solution to the relaxation we can gain important insights that 
turn out to be useful in the construction of a provably good solution to the scheduling 
problem under consideration. In this context, randomized rounding has proved to be a 
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powerful algorithmic tool. On the one hand, it yields very simple and easy to analyze 
algorithms; on the other hand, it is ahle to minutely capture the structure of the solution 
to the relaxation and to carry it over to a feasible schedule. The idea of using random- 
ized rounding in the study of approximation algorithms was introduced by Raghavan 
and Thompson [17], an overview can be found in [13]. 

For a given optimal solution a to (CQP), we compute an integral solution a by 
setting for each job j exactly one of the variables aiy to 1 with probabilities given 
through Qiy. Notice that 0 < aty < 1 and the sum of the Oiy for job j is equal to one by 
constraints (10). Although the integral solution a does not necessarily fulfill constraints 
(1 1), it represents a feasible assignment of jobs to time slots, i. e., a feasible solution to 
{QP), and thus a feasible schedule. For our analysis we require that the random choices 
are performed pairwise independently for the jobs. 

Theorem 4. Computing an optimal solution to {CQP) and using randomized round- 
ing to turn it into a feasible sehedule is a 2-approximation algorithm for the problem 
Rky lIvVyCy. 

Theorem 4 follows from the next lemma which gives a slightly stronger result in- 
cluding job-by-job bounds. 

Lemma 5. Using randomized rounding in order to turn an arbitrary feasible solution 
to {QP) into a feasible assignment of jobs to time slots yields a schedule such that the 
expected completion time of each job is bounded by twice the corresponding value (8) 
in the given solution to {QP). 

Since the value of an optimal solution to {CQP) is a lower bound on the value of an 
optimal schedule. Theorem 4 follows from Lemma 5 and linearity of expectations. 

Our result on the quality of the computed schedule described in Theorem 4 also 
implies a bound on the quality of the quadratic programming relaxation that served as 
a lower bound in our estimations. 

Corollary 6. For instances o/R|ry ll^wyCy, the value of an optimal solution to the 
relaxation {CQP) is within a factor 2 of the value of an optimal schedule. This bound is 
tight even for the case of identical parallel machines without release dates P | \Y.WjCj. 

5 Extensions to scheduling with preemptions 

In this section we discuss the preemptive problem R | r/y , pmtn \ Y^wjCj and generaliza- 
tions to network scheduling. In contrast to the nonpreemptive setting, a job may now 
repeatedly be interrupted and continued later on another (or the same) machine. In the 
context of network scheduling, it is reasonable to assume that after a job has been in- 
terrupted on one machine it cannot be continued on another machine until a certain 
communication delay is elapsed that allows the job to travel through the network to its 
new machine. 

The ideas and techniques presented in the last section can be generalized to this 
setting. However, since we have to use a somewhat weaker relaxation in order to capture 
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the possibility of preemptions, we only get a 3-approximation algorithm. This result 
ean be improved to performance guarantee 2 in the absence of nontrivial release dates 
^\pmtn but with arbitrary communication delays. For reasons of brevity we 

only give a brief sketch of the main differences to the nonpreemptive setting. 

Although the quadratic program {QP) allows to break a job into fractions and thus to 
preempt it by choosing fractional values it is not a relaxation of R | , pmtn \ Y, ^jCj . 

However, we can turn it into a relaxation by replacing (8) with the weaker constraint 

Cj = X {^k + ^Pij + X ^kfPiA for all J- 

i,k ^ f<U 

Moreover, we restrengthen the relaxation by adding the following constraint 

> Y,^jY,^ikjPij 

j j 

which bounds the objective value from below by the weighted sum of processing times 
(a similar constraint has already been used in [22]). Since Lemma 3 can be carried over 
to the new setting, we again get a convex quadratic programming relaxation. 

In order to turn an optimal solution to this relaxation into a feasible schedule, we 
apply exactly the same randomized rounding heuristic as in the nonpreemptive case. In 
particular, we do not make use of the possibility to preempt jobs but compute a nonpre- 
emptive schedule. Therefore, our results hold for the case of arbitrary communication 
delays. 

Theorem 7. Randomized rounding based on an optimal solution to the convex quadratic 
programming relaxation yields a 3-approximation algorithm for R|r,y, pmtn\Yv>!jCj 
and a 2-approximation algorithm for R\pmtn \ YkVjCj, even for the case of arbitrary 
communication delays. The same bounds hold for the quality of the relaxation. 

Theorem 7 also implies bounds on the power of preemption. Since we can compute 
a nonpreemptive schedule whose value is bounded by 3 respectively 2 times the value of 
an optimal preemptive schedule, we have derived upper bounds on the ratios of optimal 
nonpreemptive to optimal preemptive schedules. 

Corollary 8. For instances o/R|ry \ YkVjCj, the value of an optimal nonpreemptive 
schedule is at most a factor 3 above the value of an optimal preemptive schedule. In the 
absence of nontrivial release dates, this bound can be improved to 2. 



6 Conclusion 

We have presented convex quadratic programming relaxations of strongly polynomial 
size which lead to simple and easy to analyze approximation algorithms for preemp- 
tive and nonpreemptive network scheduling. Although our approach and the presented 
results might be at first sight of mainly theoretical interest, we hope that nonlinear relax- 
ations like the one we discuss in this paper will also prove useful in solving real world 
scheduling problems in the near future. With the development of better algorithms that 
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solve convex quadratic programs more efficiently in practice, the results obtained by 
using such relaxations might become comparable or even better than those based on 
linear programming relaxations with a huge number of time-indexed variables and con- 
straints. 

Precedence constraints between jobs play a particularly important role in most real 
world scheduling problems. Therefore it would be both of theoretical and of practical 
interest to incorporate those constraints into our convex quadratic programming relax- 
ation. 

Hoogeveen, Schuurman, and Woeginger [10] have shown that the problems R \ rj \ Y.Cj 
and R I I X cannot be approximated in polynomial time within arbitrarily good pre- 
cision, unless P=NP. It is an interesting open problem to close the gap between this 
negative result and the 2-approximation algorithm presented in this paper. 
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Abstract. We present a novel approaeh to eompute Lagrangian lower bounds on 
the objective function value of a wide class of resource-constrained project sche- 
duling problems. The basis is a polynomial-time algorithm to solve the following 
seheduling problem: Given a set of activities with start-time dependent costs and 
temporal constraints in the form of time windows, find a feasible schedule of mi- 
nimum total cost. In fact, we show that any instanee of this problem ean be solved 
by a minimum cut computation in a certain direeted graph. 

We then diseuss the performanee of the proposed Lagrangian approach when ap- 
plied to various types of resource-constrained project scheduling problems. An 
extensive computational study based on different established test beds in pro- 
ject scheduling shows that it can significantly improve upon the quality of other 
comparably fast computable lower bounds. 



1 Introduction and Problem Formulation 

Resource-constrained project scheduling problems usually comprise several activities 
or jobs which have to be scheduled subject to both temporal and resource constraints in 
order to minimize a certain objective. Temporal constraints often consist of precedence 
constraints, that is, certain activities must be completed before others can be processed, 
but sometimes even arbitrary minimal and maximal time lags, so-called time windows 
between pairs of activities have to be respected. Moreover, activities require resources 
while being processed, and the resource availability is limited. Also time-varying re- 
source requirements and resource availabilities may occur. Most frequently, the project 
makespan is to be minimized, but also other, even non-regular objective functions are 
considered in the literature. For a detailed account of the various problem settings, most 
relevant references as well as a classification scheme for resource-constrained project 
scheduling problems we refer to [2]. 
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In general, resouree-eonstrained projeet seheduling problems are among the most 
intraetable problems, and in the ease of time windows even the problem of finding a 
feasible solution is NP-hard, cfi, e.g., [1]. The intractability of these problems motivates 
the search for good and fast computable lower bounds on the objective value, which may 
be used to improve both heuristics and exact procedures. However, it is very unlikely 
that provably good lower bounds can be computed within polynomial time, since proj ect 
scheduling problems contain node coloring in graphs as a special case [20]. Thus, as for 
node coloring, there is no polynomial-time approximation algorithm with a performance 
guarantee less that for some e > 0, unless P — NP. This negative results also implies 
limits on the computation of good lower bounds. 

Problem formulation. Let V = {0, . . . , n -f 1} be a set of activities j with integral 
activity durations Pj . All activities must be scheduled non-preemptively, and by S' = 
(So, . . . , S„+i ) we denote a schedule, where Sj is the start time of activity j. Activities 0 
and n + 1 are assumed to be dummy activities indicating the project start and the project 
completion, respectively. Temporal constraints in the form of minimal and maximal 
time lags between pairs of activities are given. By dij we denote a time lag between 
two activities i,j € V, and L C 1/ x H is the set of all time lags. We assume that 
the temporal constraints always refer to the start times, thus every schedule S has to 
fulfill Sj > Si + dij for all (i, j) € L. Note that dij > 0 (dji < 0) implies a minimal 
(maximal) positive time lag of Sj relative to Si, thus so-called time windows of the 
form Si + dij < Sj < Si — dji between any two activities can be modeled. Ordinary 
precedence constraints can be represented by letting dij = Pi if activity i must precede 
activity j . Additionally, we suppose that a time horizon T as an upper bound on the proj ect 
makespan is given. It can be checked in polynomial time by longest path calculations 
if such a system of temporal constraints has a feasible solution. Throughout the paper 
we will assume that a schedule exists that satisfies all temporal constraints. We then 
obtain for each activity a set of (integral) feasible start times Ij := {ESj , ..., LSj}, 
j G V, where ESj and LSj denote the earliest and latest start time of activity j, 
respectively. Activities need resources for their processing. In the model with constant 
resource requirements, we are given a finite set TZ of different, renewable resources, and 
the availability of resource k £ TZ is denoted by Rk, that is, an amount of Rk units of 
resource k is available throughout the project. Every activity j requires an amount of 
Vjk units of resource k, k <E TZ. The activities have to be scheduled such as to minimize 
a given measure of performance, usually the project makespan. 

Project scheduling problems are often formulated as integer linear programs with 
time-indexed binary variables Xjt, j € V,t e {0, ..., T }, which are defined by Xjt = 1 
if activity j starts at time t and Xjt = 0 otherwise. This leads to the following, well 
known integer linear programming formulation. 
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minimize 
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3 


Xjt e {0, 1}, 


j € V, t G {0, . . . , T}. 


(5) 



Constraints (2) indicate that each activity is started exactly once, and inequalities (3) 
represent the temporal constraints given by the time lags L. Inequalities (4) assure that 
the activities processed simultaneously at time t do not consume more resources than 
available. Note that this formulation can easily be generalized to time-dependent resource 
profiles, i.e., = Rk{t) andrjk = rjfc(t). In fact, although we discuss in the following 

the case of time-independent resource profiles only, the presented results carry over to 
the general case. Computational results for both models are discussed in Sect. 3. 

Related work. The above time-indexed formulation for project scheduling problems 
has been used before by various authors (e.g. [18,7,5,4]), sometimes with a weaker 
formulation of temporal constraints given by t{xjt — Xu) > dij, (i,j) € L. Most 
relevant to our work is the paper by Christofides, Alvarez-Valdes, and Tamarit [7]. They 
have investigated a Lagrangian relaxation of the above integer program in order to obtain 
lower bounds on the makespan. They solve the Lagrangian relaxation with the help of a 
branch and bound algorithm, apparently unaware that it can be solved in polynomial time 
by purely combinatorial methods (see Sect. 2). As a matter of fact, the LP relaxation of (2), 
(3), and (5) is known to be integral. This important structural result is due to Chaudhuri, 
Walker, and Mitchell [5]. For problems with precedence constraints and time-varying 
resources, this has also been shown by Cavalcante, De Souza, Savelsbergh, Wang, and 
Wolsey [4] . The latter authors solve the linear programming relaxation of ( 1 ) - (5) in order 
to exploit its solution for ordering heuristics to construct good feasible schedules. Another 
technique to compute lower bounds on the project makespan for resource-constrained 
project scheduling problems has been proposed by Mingozzi, Maniezzo, Ricciardelli, 
and Bianco [14]. Their approach relies on a different mathematical formulation that 
is based on variables yn which indicate if a (resource feasible) subset of activities 
Ve C V is in process at a certain time t. Clearly, this formulation is of exponential 
size, since there are exponentially many such feasible subsets V(. They derive different 
lower bounds by considering several relaxations, including a very fast computable lower 
bound, usually referred to as LB 3 , which is based on the idea to sum up the processing 
times of activities which pairwise cannot be scheduled simultaneously. Their bounds 
have then been evaluated and modified by various authors. In particular, Brucker and 
Knust [3] solve the following relaxation: Feasible subsets of activities must be scheduled 
(preemptively) such that every activity receives at least its total processing time. Brucker 
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and Knust apply column generation where the prieing is done by braneh and bound. They 
obtain the best known bounds on the majority of instances of a well known test bed [13]. 
However, their approach often requires extremely large computation times. 

Results. In the spirit of Christofides, Alvarez- Valdes, and Tamarit [7], we propose 
a Lagrangian relaxation of (1) - (5) to compute lower bounds for resource-eonstrained 
project scheduling problems. Within a subgradient optimization algorithm we solve a 
series of project scheduling problems given by (2), (3), and (5), subject to start time 
dependent costs for each activity. The core of our approach is a direct transformation 
of this problem to a minimum cut problem in an appropriately defined directed graph 
which can then be solved by a standard maximum fiow algorithm. The potential of this 
approaeh is demonstrated by eomputational results. We have used widely accepted test 
beds in project scheduling, namely problems with ordinary precedence eonstraints [13] 
as well as arbitrary minimal and maximal time lags [21], and labor-eonstrained schedu- 
ling problems with a time varying resource profile modeled after chemical production 
processes within BASF AG, Germany [12]. The experiments reveal that our approach is 
capable of computing very good lower bounds at very short computation times. We thus 
improve previous, fast computable lower bounds, and in the setting with time windows 
we even obtain best known lower bounds for quite a few instances. Compared to other 
approaehes which partially require prohibitive running times, our algorithm offers a 
good tradeoff between quality and computation time. It also turns out that it is especially 
suited for problems with extremely scarce resourees, which are the problems that tend 
to be intractable. For the instances stemming from BASF, Cavalcante et al. [4] report 
on tremendous computation times for solving the corresponding linear programming 
relaxations. Our experiments show that one can obtain essentially the same value as 
with the LP relaxation much more efficiently. 

Organization of the paper. In Seet. 2 we present the Lagrangian relaxation of the inte- 
ger program ( I ) - (5), and introduee a direct transformation of the resulting subproblems 
to minimum cut problems in an appropriate direeted graph. Section 3 is then concerned 
with an extensive computational study of this approach. We analyze our algorithm in 
comparison to both the solution of the corresponding LP relaxations, and other lower 
bounding algorithms. We conelude with some remarks on future research in Sect. 4. Due 
to spaee limitations, quite some details have been omitted from this extended abstraet. 
They will be presented in the full version [17]. 



2 The Lagrangian Relaxation 

Christofides, Alvarez-Valdes, and Tamarit [7] have proposed the following Lagrangian 
relaxation of the time indexed integer programming formulation of resouree-constrained 
project scheduling given by (1) - (5). They dualize the resource constraints (4), and 
introduce Lagrangian multipliers Xtk > 0, t € {0, ..., T}, k <E TZ. 

minimize Yy t X n+i, t + Yy Yy [ Yy ’>'jk Y Ykjxp - Y Y ^tk ■ Rk (6) 

t j t ^ken s=t ' t keiz 

subjeet to (2), (3), and (5). 
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If one omits the constant term TlkeTL ' Rk and introduces weights 

Wjt = XlfcgT?, ^ Agfc forallj ^ n+lwAwn+i,t = t, (6) can be reformulated 

as 

minimize c{x) := Wjt ■ Xjt subject to (2), (3), and (5). (7) 

3 t 

This formulation specifies a project scheduling problem where the activities have start- 
time dependent costs, and where the aim is to minimize the overall cost subject to 
minimal and maximal time lags between activities. We refer to this problem as project 
scheduling problem with start-time dependent costs. Note that all weights can without 
loss of generality be assumed to be positive, since due to (2), any additive transformation 
of the weights only affects the solution value, but not the solution itself. (In (7) the 
weights are non-negative by definition.) The problem can trivially be solved by longest 
path calculations if the Wjt are non-decreasing in t. However, this is not the case for 
general weights. 

Observe also that the above Lagrangian relaxation is not restricted to makespan 
minimization, but can as well be applied to any other regular, and even non-regular 
objective function. Thus the procedure proposed below is applicable to a variety of 
project scheduling problems like, for instance, the minimization of the weighted sum 
of completion times, problems that aim at minimizing lateness, or resource investment 
problems [15,8]. 

The transformation. We now present a reduction of the project scheduling problem 
with start time dependent costs given in (7) to a minimum cut problem in a directed 
graph D = {N, A) which is defined as follows. 




Fig. 1. The left digraph represents the relevant data of the underlying example: Each 
node represents an activity, each arc represents a temporal constraint. The right digraph 
D is the corresponding graph obtained by the transformation. Each assignment arc of 
D corresponds to a binary variable Xjt. Arcs marked by a white arrow head are dummy 
arcs that connect the source a and the sink b with the remaining network. 



• Ab(ie5. The set W of nodes contains for each activity j € V thenodesujt,! G {ESj,..., 
LSj + 1 }. Furthermore, it contains two auxiliary nodes, a dummy source a and a dummy 
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sink b. Hence, N := {a, 6} U {ujt\j <E V,t e {ESj, LSj + 1}}. 

• Arcs. The arc set A can be divided into three disjoint subsets. The set of assignment 

arcs is induced by the binary variables Xjt of the integer program (7) and contains all 
arcs (Ujt, for all j e V and t e Ij, where Xjt corresponds to {ujt, The 

set of temporal arcs is defined as the set {(uis, Uj^s+dij)\{i,j) <E L, s e li}. It will later 
be needed to guarantee that no temporal constraint is violated. Finally, a set of dummy 
arcs connects the source and the sink nodes a and b with the remaining network. The 
dummy arcs are given by {(a, Ui^ESi I* ^ V}U 6) |i € H}. 

• Capacities. The capacity of an assignment arc (ujt, j <E V,t <E Ij, equals 

the weight Wjt of its associated binary variable Xjt, the capacity of temporal arcs and 
dummy arcs is infinite, and all lower capacities are 0. 

Figure 1 shows an example of the graph D based on an instance with 5 activities 
V = {1, ..., 5}. The set of time lags is {di 2 — 1, ^23 = —2, d34 = 2, ds4 = 3}. The 
activity durations are Pi = P4 = 1,P2 = Ps = 2, andp3 = 3, andT = 6 is a given upper 
bound on the project makespan. Thus the earliest start vector is ES = (0, 1, 0, 3, 0) and 
the latest start vector is LS = (3, 4, 3, 5, 2). 

We use the following notation. Given a directed graph, an a, b-cut is a pair (X, X) 
of disjoint sets X, X c N with XU X = N, and a <E X,b <E X. The capacity c{X, X) 
of a cut {X, X) is the sum of capacities of the forward arcs in the cut, c{X, X) := 
Y^{u,u)€{X,X) u)- 

Theorem 1. A minimum a, b-cut {X* ,X*),a <E X* , b <E X* of the digraph D corre- 
sponds to an optimal solution x* of the integer program (7) of the project scheduling 
problem with start time dependent costs by setting 

_ J 1 if{ujt,Ujyj-i)Eaforwardarcofthecut{X*,X*), 

- 1 0 otherwise. 

Moreover, the value c{x*) of that solution equals the capacity c{X* ,X*) of the minimum 
cut {XfX*). 

The proof crucially uses the fact that each minimum a, 6-cut of the digraph D consists of 
exactly one forward arc {ujt, Ujyj-i ) for every activity j. Note that this only holds since 
the weights Wjt are strictly positive and thus also the capacities of the arcs are strictly 
positive. Furthermore, it is essential that the given instance has a feasible solution and 
thus a minimum cut has finite capacity. 

Since D has 0{n ■ T) nodes and 0((n + m) ■ T) arcs, a minimum cut in D can 
be computed in 0(nmT^log(T)) time with the classical push-relabel-algorithm for 
maximum flows [9]. Here, m is the number of given time lags L. 

A related transformation has been investigated by Chaudhuri, Walker, and Mitchell 
[5]. They transform the integer program (7) into a cardinality-constrained stable set 
problem in comparability graphs, with the objective to identify a stable set of minimum 
weight among all stable sets of maximum cardinality. The weighted stable set problem in 
comparability graphs can be transformed in polynomial time to a maximum cut problem 
on a digraph, in which the maximum cut corresponds to the maximum weighted stable 
set, cf [10,16]. However, the resulting digraph is dense while the digraph resulting from 
our transformation has a very sparse structure, since the set L of temporal constraints is 




New Lower Bounds for Resouree-Constrained Projeet Scheduling 



145 



usually sparse. Moreover, the directed graph D as defined above need not be acyclic, and 
thus cannot be derived from a transitive orientation of the comparability graph defined 
in [5]. 



3 Experimental Study 

We first compare the performance of the Lagrangian relaxation approach with the LP- 
relaxation of (1) - (5). We then empirically analyze how the running time depends on 
the time horizon and the number of activities. We also analyze the dependency of both 
running time and solution quality on the scarceness of resources. Next, we compare our 
bounds with those computed by other lower bounding algorithms. We finally briefly 
investigate the computation of feasible schedules from the solution of the Lagrangian 
relaxation. 

We use a standard subgradient method to attack the Lagrangian dual. It is aborted 
if the objective value was not improved significantly over five consecutive iterations. If 
this happens within the first 1 0 iterations we restart the procedure with another choice 
of step sizes. 

For the computations we have used strengthened resource inequalities which have 
been proposed in [7]. They guarantee that no activity is scheduled parallel to the dummy 
sink n + 1. 



t t 

Xjs) + Rk Xn+i,s<Rk, k eTl,t = 0,. . . ,T (9) 

j s=t-pj + l s=ES„+i 

3.1 Benchmark Instances 

We have applied our algorithm to the test beds of the ProGen and the ProGen/max library 
[13,21], and to a small test bed of problems modeled after chemical production processes 
with labor constraints [11]. 

The ProGen library [13] provides instances for precedence-constrained scheduling 
with multiple resource constraints and with 30, 60, 90, and 120 activities, respectively. 
They are generated by modifying three parameters, the network complexity, which re- 
flects the average number of direct successors of an activity, the resource factor, which 
describes the average number of resources required in order to process an activity, and 
the resource strength, which is a measure of the scarcity of the resources. The resource 
strength varies between 0. 1 and 0.7 where a small value indicates very scarce resources. 
This variation results into 480 instances of each of the first three instance sizes (30, 60, 
90), and 600 instances of 120 activities. The activity durations were chosen randomly 
between 1 and 10 and the maximum number of different resources is 4 per activity. 
The library also contains best known upper bounds on the makespan of these instances, 
which we have used as time horizon T. >From the whole set of instances we only took 
those for which the given upper bound is larger than the trivial lower bound LBq which 
is the earliest start time ESn+i of the dummy activity n + 1. The number of instances 
then reduces to 264 (30 activities), 185 (60 activities), 148 (90 activities), and 432 (120 
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activities). The time horizon of these instances varies between 35 and 306. For further 
details we refer to [13]. 

The ProGen/max library [21] provides 1080 instances for scheduling problems with 
time windows and multiple resource requirements, each of which consists of 100 ac- 
tivities. The parameters are similar to those of the ProGen library, but an additional 
parameter controls the number of cycles in the digraph of temporal constraints. 21 of 
the 1080 instances are infeasible and for another 693 instances there exists a feasible 
solution with a project makespan which equals the trivial lower bound LBq. Thus, the 
number of instances of interest within this test bed reduces to 366. For these instances, 
the time horizon varies between 253 and 905. For further details we refer to [21]. 

Finally, we consider instances which have their origin in a labor-constrained schedu- 
ling problem (LC SP instances) from BASF AG, Germany, which can briefly be summari- 
zed as follows : The production process for a set of orders has to be scheduled. Every order 
represents the output of a constant amount of a chemical product, and the aim is to mini- 
mize the project makespan. The production process for an order consists of a sequence 
of identical activities, each of which must be scheduled non-preemptively. Due dates 
for individual orders are given, and due to technical reasons there may be precedence 
constraints between activities of different orders. Additionally, resource constraints have 
to be respected, which are imposed by a limited number of available workers: An activity 
usually consists of several consecutive tasks which require a certain amount of personnel. 
Thus, the personnel requirement of an activity is a piecewise constant function. More 
details can be found in [12]. The instances considered here are taken from [11]. 

3.2 Computational Results 

Computing Environment. Our experiments were conducted on a Sun Ultra 2 with 200 
MHz clock pulse operating under Solaris 2.6 with 512 MB of memory. The code is 
written in and has been compiled with the GNU g-l-l- compiler version 2.7.2. We 
use Cherkassky and Goldberg’s maximum flow code [6]. It is written in C and has 
been compiled with the GNU gcc compiler version 2.7.2. Both compilers used the -03 
optimization option. All reported CPU times have been averaged over three runs. 

LP relaxation versus Lagrangian relaxation. Since the LP relaxation of (7) is integral, 
the lower bounds obtained by the Lagrangian relaxation are bounded from above by the 
optimal solution of the LP relaxation. Since the LPs are usually very large we have 
compared the running times to solve these LPs with the Lagrangian approach. For the 
ProGen testbed with 30 activities [13], the LPs are solved within 18 seeonds on average 
(max. 516 sec.) with CPLEX version 4.0.8, while the Lagrangian relaxation plus the 
subgradient optimization requires only one second (max. 6.2 sec.) at an average number 
of 104 iterations. The average deviation of the solution values turned out to be less than 
1 %. 

For the LCSP instances Cavalcante et al. [4] have solved LP-relaxations of different 
integer programming formulations in order to obtain both lower bounds and ideas for 
generating feasible schedules. They report on excessive computation times to solve the 
LP-relaxation of (1)- (5); particularly for large instances they require more than 5 hours 
on average on a RS6000 model 590 (see [4], Table 1). To solve the corresponding La- 
grangian relaxation (including the subgradient optimization) it requires only one minute 
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on average for the same instances, while the lower bounds obtained are only slightly 
inferior (less than 2%). It turns out that for these instances the same bounds can also 
be obtained by solving a weaker LP-relaxation based on a weaker formulation of the 
temporal constraints (see [4], Table 2); here the computation times correspond to those 
of the Lagrangian approach. However, note that the Lagrangian relaxation is based on 
the strong formulation of the temporal constraints (3), and thus has the potential of 
providing better bounds. 



Problem characteristics. We have empirically analyzed the running time and the 
performance of the Lagrangian method with respect to varying problem parameters. Fi- 
gures 2 (a) and (b) display plots that show how the running time depends on both the time 
horizon and the number of activities. Each plot additionally contains a corresponding 
regression curve. Since other algorithms often require large running times when dealing 
with instances consisting of very hard resource constraints, we investigate the depen- 
dency of the running time of our algorithm on the resource strength. Recall that a low 
resource strength parameter is an indicator for such instances. As depicted in Fig. 2 (c), 
the running time of our algorithm seems only slightly affected by the resource strength 
parameter. 



Other lower bounding algorithms. In comparison to other lower bounding procedu- 
res, our approach behaves quite reasonable with respect to the tradeoff between quality 
and computation time. For the scenario with precedence constraints, we compare our 
algorithm with two other approaches which are both based on [14] . First, we consider the 
lower bounds reported by Brucker and Knust [3] which are the strongest known bounds 
for the ProGen instances. Second, we have implemented the 0(|Hp) lower bound LB-i 
(cf. Sect. 1). The average results on the running time and the quality of the lower bounds 
are provided in Table 1 , while Fig. 2 (d) displays the quality of the bounds depending on 
the resource strength parameter. Compared to the bound LB^, our algorithm produces 
far better bounds in most of the cases. While the computation time for LB 3 is negli- 
gible (< 0.5 sec.), the algorithm of Brucker and Knust provides better bounds, but in 
exchange for much larger running times. To obtain the lower bounds for the instanees 
which consist of 120 activities, their algorithm occasionally requires a couple of days 
per instance (on a Sun Ultra 2 with 167 MHz clock pulse), as reported to us in private 
communication. We could solve all of these instances within an average of less than a 
minute and a maximum of 362 seconds using 12 MB memory. 

For the instances with time windows, the algorithm proposed by Brucker and Knust 
[3] cannot be applied, since it is developed for the model with precedence constraints 
only. The best known lower bounds collected in the library are computed by different 
algorithms, mostly by a combination of preprocessing steps and a generalization of the 
lower bound LB^. As indicated in Table 1, the results of our algorithm on this test bed 
are less satisfactory with respect to quality of the bounds as well as running times. The 
reason might be a weaker average resource strength which leads to bounds of low quality, 
and also the larger time horizons which result in large running times. However, we were 
able to improve 38 of the best known lower bounds among the 366 instances. 




148 



R.H. Mohring et al. 





Ll I I U 

30 60 90 120 



(b) Number of activities 




Fig. 2. Plots (a) and (b) show the running time depending on the time horizon and 
the number of activities (based on all instanees of the ProGen library). Graphic (c) 
displays the effect of the resource strength on the running time for fixed n = 120 and 
T G [100, 120], and Graphic (d) visualizes the quality ofthe different bounds depending 
on the resource strength (with respect to the critieal path lower bound LBq). 



For the LCSP instances, our eomputational experiences coincide with the above 
observation that the quality of our lower bounds increases when the availability of 
personnel (resources) is very low and vice versa. 



Table 1. Lower bounds obtained by the Lagrangian relaxation for the different test beds 
as deseribed in Sect. 3.1. prec and temp indicates whether the instanees consist of 
precedence constraints [13] or of arbitrary time lags [21], respectively. 





best known j 


Type 


#act. 


#inst. 


LBo 


LB 


UB 


CPU 


LBs 


LB 


CPU 


prec 


60 


185 


71.3 


78.8 


90.6 


6.1 


74.2 


85.6 


13.5 


prec 


90 


148 


86.3 


99.6 


115.8 


20.0 


86.8 


106.1 


170.8 


prec 


120 


432 


94.6 


116.7 


137.6 


56.9 


102.0 


124.9 


n.a. 


temp 


100 


366 


431.4 


435.9 


499.0 


72.1 


434.2 


452.2 


n.a. 



Computing feasible schedules. Besides the computation of lower bounds, both LP 
and Lagrangian relaxations allow the construetion of good upper bounds by exploit- 
ing the structure of the corresponding solution within heuristics. For labor-constrained 
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scheduling problems, Cavalcante et al. [4] as well as Savelsbergh, Uma, and Wein [19] 
have proposed such techniques. So far, we have performed experiments for the ProGen 
instances by extracting an ordering on the activities from the solution of the Lagrangian 
relaxation and used them as priority rules to generate feasible solutions. It turns out that 
the priority rules deliver the best schedules when compared to standard priority lists that 
can be found in the literature. The average deviation of these upper bounds compared to 
the best known upper bounds is 1 8%. 

4 Concluding Remarks 

We have presented a lower bounding procedure that can be applied to a wide variety of 
resource-constrained project scheduling problems. The bounds obtained by this algo- 
rithm can be computed fast and are particularly suitable for scenarios with very scarce 
resources. The algorithm is easy to implement since it basically solves a sequence of 
minimum s-f-cut problems. 

Future research will be concerned with the integration of other classes of inequalities, 
which may strengthen the lower bounds at low computational costs. The structure of the 
underlying minimum cut problem remains unchanged. In particular, motivated by [14], 
such inequalities can be derived by identifying sets W of activities out of which not 
more than f < |FF| activities can be scheduled simultaneously. Furthermore, it could be 
valuable to adapt the maximum flow algorithm for our specific application, and also to 
recycle the flow (cut) data of the previous iteration. 

Acknowledgements. The authors are grateful to Matthias Muller-Hannemann for 
many helpful discussions on maximum flow algorithms, and to Olaf Jahn for his technical 
support. 
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Abstract. We give a polynomial approximation scheme for the problem of sche- 
duling on uniformly related parallel machines for a large class of objective llmc- 
tions that depend only on the machine completion times, including minimizing the 
Ip norm of the vector of completion times. This generalizes and simplifies many 
previous results in this area. 



1 Introduction 

We are given n jobs with processing times pj, j <E J = { 1 , . . . , n} and m machines 
Ml, M2, . . . , Mm, with speeds si, §2, . . . , Sm- A schedule is an assignment of the n 
jobs to the m machines. Given a schedule, for 1 < i < m, Ti denotes the weight of 
machine Mi which is the total processing time of all jobs assigned to it, and Ci denotes 
the completion time of M^, which is Ti/si. (Each job is assigned to exactly one machine, 
i.e., we do not allow preemption.) 

Our objective is, for some fixed function / : [0,-t-oo) — ^ [0, -foo), one of the 
following: 

(I) minimize I]™ 1 /(Gi), (III) maximize Xlili /(Gi), or 

(II) minimize max^i /(Gi), (IV) maximize min™i/(Gi). 

Most of such problems are NP-hard, see [9,14]. Thus we are interested in appro- 
ximation algorithms. Recall that a polynomial time approximation scheme (PTAS) is a 
family of polynomial time algorithms over e > 0 such that for every e and every instance 
of the problem, the corresponding algorithm outputs a solution whose value is within a 
factor of (1 -1- e) of the optimum value [9]. 

We give a PTAS for scheduling on uniformly related machines for a rather general 
set of functions /, covering many natural functions studied before. Let us give some 
examples covered by our results and references to previous work. 

Example 1. Problem (I) with f{x) = a; is the basic problem of minimizing the maxi- 
mal completion time (makespan). It was studied for identical machines in [10,1 1,15,12]; 
the last paper gives a PTAS. Finally, Hochbaum and Shmoys [13] gave a PTAS for 
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uniformly related parallel machines. Thus our result can be seen as a generalization of 
that result. In fact our paper is based on their techniques, perhaps somewhat simplified. 

Example 2. Problem (I) with f{x) = for any p > 1. This is equivalent to 
minimizing the Ip norm of the vector {C \, . . . , C^), i.e., For p = 2 and 

identical machines this problem was studied in [6,5], motivated by storage allocation 
problems. For a general p a PTAS for identical machines was given in [1]. For related 
machines this problem was not studied before. Note that for p < 1 the minimization 
problem is trivial: it is optimal to schedule all jobs on the fastest machine (choose 
one arbitrarily if there are more of them). In this case a more interesting variant is the 
maximization version, i.e., (III). 

Example 3. Problem (IV) with f{x) = x. In this problem the goal is to maximize 
the time when all the machines are running; this corresponds to keeping all parts of 
some system alive as long as possible. This problem on identical machines was studied 
in [8,7], and [16] gave a PTAS. For uniformly related machines a PTAS was given in [3]. 

Example 4. Scheduling with rejection. In this problem each j ob has associated certain 
penalty. The schedule is allowed to schedule only some subset of jobs, and the goal is 
to minimize the maximal completion time plus the total penalty of all rejected jobs. 
This problem does not conform exactly to any of the categories above, nevertheless our 
scheme can be extended to work for it as well. This problem was studied in [4] where also 
a PTAS for the case of identical machines is given. For related machines this problem 
was not studied. 

Our paper is directly motivated by Alon et al. [2], who proved similar general results 
for scheduling on identical machines. We generalize it to uniformly related machines 
and a similar set of functions /. Even for identical machines, our result is stronger than 
that of [2] since in that case we allow a more general class of functions /. 

The basic idea is to round the size of all jobs to a constant number of different 
sizes, to solve the rounded instance exactly, and then re-construct an almost optimal 
schedule for the original instance. This rounding technique traces back to Hochbaum 
and Shmoys [12]. We also use an important improvement from [1,2]: the small jobs are 
clustered into blocks of jobs of small but non-negligible size. The final ingredient is that 
of [13,3]: the rounding factor is different for each machine, increasing with the weight 
of the machine (i.e., the total processing time of jobs assigned to it). Such rounding is 
possible if we assign the j obs to the machines in the order of non-decreasing weight. This 
is easy to do for identical machines. For uniformly related machines we prove, under a 
reasonable condition on the function /, that in a good solution the weight of machines 
increases with their speed, which fixes the order of machines. 



2 Our Assumptions and Results 



Now we state our assumptions on the function /. Let us note at the beginning that the 
typical functions used in scheduling problems satisfy all of them. 

The first condition says that to approximate the contribution of a machine up to a 
small multiplicative error, it is sufficient to approximate the completion time of the 
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machine up to a small multiplicative error. This condition is from Alon et al. [2], and it 
is essential for all our results. 

(F=k) : (Ve > 0)(3d > 0)(Va;,y > 0) 

{\y -x\<Sx ^ \f{y) - f{x)\ < ef{x)). 

If we transform / so that both axes are given a logarithmic scale, the condition naturally 
translates into uniform continuity, as |y — a;| < 6x iff (1 — S)x < y < (1 + S)x iff 
ln(l — d) + Inx < In y < ln(l + d) + Inx, and similarly for f{x). More formally, (F*) 
is equivalent to the following statement: 

(F**) : The function hj : (— oo, +oo) — ^ (— oo, +oo) 
defined by /i/(z) = ln/(e^) 
is defined everywhere and uniformly continuous. 

We also need a condition that would guarantee that in the optimal schedule, the 
weights of the non-empty machines are monotone. These conditions are different for 
the cases when the objective is maximize or minimize ^ f{Ci), and for the cases of 
min-max or max-min objectives. 

Recall that a function g is convex iff for every x < y and 0 < A < y — x, 
f{x + Z\) + f{x — A) < f{x) + f{y). For the cases of min-sum and max-sum we 
require this condition: 

(G*) : The function y/ : (— oo, -Too) — ^ [0,+oo) 
defined by gf{z) = /(e^) is convex. 

Note that f{x) = gf{lnx). Thus, the condition says that the function / is convex, if 
plotted in a graph with a logarithmic scale on the x-axis and a linear scale on the y- 
axis. This is true for example for any non-decreasing convex function /. However, y/ is 
convex, e.g., even for f{x) = ln(l + x). On the other hand, the condition (G*) implies 
that / is either non-decreasing or it is unbounded for x approaching 0. 

For the case of min-max or max-min we require that the function / is bimodal 
on (0,-Foo), i.e., there exists an xq such that / is monotone (non-decreasing or non- 
increasing) both on (0,a;o] and [a;o,+oo). (E.g., {x —1)^ + 1 is bimodal, decreasing 
on (0, 1] and increasing on [1, -Foo).) This includes all convex functions as well as all 
non-decreasing functions. 

Note that none of the conditions above puts any constraints on /(O). 

Last, we need the function / to be computable in the following sense: for any e > 0 
there exists an algorithm that on any rational x outputs a value between (1 — e)f{x) and 
(1 + e)f{x), in time polynomial in the size of x. To simplify the presentation, we will 
assume in the proofs that / is computable exactly; choosing smaller e and computing 
the approximations instead of the exact values always works. Typical functions / that 
we want to use are computable exactly, but for example if f{x) = x^ for non-integral p 
then we can only approximate it. 

Our main results are: 
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Theorem 1. Let f be a non-negative computable function satisfying the conditions (F*) 
and (G*). Then the scheduling problems of minimizing and maximizing ^ f{Ci) on 
uniformly related machines both possess a PTAS. 



Theorem 2. Let f be a non-negative computable bimodal function satisfying the con- 
dition (F*). Then the scheduling problems of minimizing max/(C'i) and of maximizing 
min f{Ci) on uniformly related machines both possess a PTAS. 



Theorem 3. Let f be a non-negative computable function satisfying the condition (F =i<). 
Then the scheduling problems of minimizing or maximizing fZf{Ci), of minimizing 
max f{Ci), and of maximizing min f{Ci) on identical machines all possess a PTAS. 

All our PTAS are running in time 0{nTp{\L\)), where c is a constant depending on 
the desired precision and /, p is the polynomial bounding the time of the computation of 
/, and |/| is the size of the input instance. Thus the time is polynomial, but the exponent 
in the polynomial depends on / and e. This should be contrasted with Alon et al [2], 
where for the case of identical machines they are able to achieve linear time (i.e., the 
exponential dependence on e is only hidden in the constant) using integer programming 
in fixed dimension. It is an open problem if such an improvement is also possible for 
related machines; this is open even for the case of minimizing the makespan (Example 
1 above). 

3 Ordering of the Machines 

The following lemma implies that, depending on the type of the problem and /, we can 
order the machines in either non-decreasing or non-increasing order of speeds and then 
consider only the schedules in which the weights of the machines are non-decreasing 
(possibly with the exception of the empty machines). It shows why the conditions (G *) 
and bimodality are important for the respective problems; it is the only place where the 
conditions are used. Note that for identical machines the corresponding statements hold 
trivially without any condition. 

Lemma 4. Let the machines be ordered so that the speeds Si are non-decreasing. 

(i) Under the same assumptions as in Theorem 1, there exists a schedule with minimal 
(maximal, resp.) f{Ci) in which the non-zero weights of the machines are mono- 
tone non-decreasing (monotone non-increasing, resp.).I.e.,forany 1 < i < j < m 
such that Ti, Tj > 0, we have Ti < Tj (Ti > Tj, resp.). 

(ii) Under the same assumptions as in Theorem 2, there exist schedules both with mini- 
mal max f{Ci) and with maximal min f{Ci) in which the non-zero weights of the 
machines are either monotone non-decreasing or monotone non-increasing. 

Proof, (i) We prove the lemma for minimization of f{Ci); the case of maximization 
is similar. We prove that if Ti > Tj then switching the assigned jobs between Mi 
and Mj leads to at least as good schedule. This is sufficient, since given any optimal 
schedule we can obtain an optimal schedule with ordered weights by at most m — 1 such 
transpositions. 
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Denote s = Sj/Si, A = \ns, X = InCj = \n{Tj/sj), and Y = \nCi = \n{Ti/si). 
From the assumptions we have X < Y and 0 < A < X — Y . The difference of the 
original cost and the cost after the transposition is f{Ci) + f{Cj) — f {Ci-s) — f{Cj/s) = 
gf{X) + gf(Y) — gf{X + A) — ggiY — Z\) > 0, using convexity of gf. Thus the 
transposed schedule is at least as good as the original one. 

(ii) Suppose that we are maximizing min f{Ci) and / isbimodal, first non- increasing 
then non-decreasing. We prove that if > Tjfori<j , then switching the assigned jobs 
between Mi and Mj can only improve the schedule. We have Cj/s < Ci, Cj <Ci-s, 
for s = Sj/si. By bimodality we have Cj} < minjCj/s, Ci ■ s}, hence the 

transposed schedule can only be better. The cases of minimization of max f{Ci) and / 
first non-decreasing then non-increasing are similar, with the order reversed as needed. 



4 Preliminaries and Definitions 

Let h > 0 and A be such that A = 1/h is an even integer; we will choose it later. The 
meaning of S is the (relative) rounding precision. 

Given w, either 0 or an integral power of two, intuitively the order of magnitude, we 
will represent a set of jobs with processing times not larger than w as follows. For each 
job of size more than Sw we round its processing time to the next higher multiple of 
S'^w; for the remaining small jobs we add their processing times, round up to the next 
higher multiple of Sw, and treat these jobs as some number of jobs of processing time 
Sw. Now it is sufficient to remember the number rii of modified jobs of processing time 
iS'^w, for each i, X < i < }? . Such a vector together with w is called a configuration. 

In our approximation scheme, we will proceed machine by machine, and use this 
representation for two types of sets of jobs. First one is the set of all jobs scheduled so 
far; we represent them always with the least possible w (principal configurations below). 
The second type are the sets of jobs assigned to individual machines; we represent them 
with w small compared to the total processing time of their jobs (heavy configurations 
below), and this is sufficient to guarantee that the value of / can be approximated well. 

Definition 5. Let A C J be a set of jobs. 

- The weight of a set of jobs is W (A) = Pr 

- A configuration is a pair cx = {w, {n\,n\+i , . . . , n\ 2 )), where w = 0 or w = 2^ 
for some integer i (possibly negative) and n is a vector of nonnegative integers. 

- A configuration {w, n) represents A if 

(i) no job j E A has processing time pj > w, 

(ii) for any i, X < i < A^, rii equals the number of jobs j <E A with pj <E {{i — 
l)S‘^w, iS'^w], and 

(iii) n\ = \W {A) / {Sv))) where A = [j A\ pj < hw}. 

- The principal configuration of A is the configuration a{A) = (w, n) with the smal- 
lest w that represents A. 

- The weight of a configuration {w, n) is defined by W (w, n) = nfS'^w. 

- A configuration {w, n) is called heavy if its weight W(w, n) is at least w/2. 

- The successor of a configuration {w, n) is succ(ti), n) = {w, n + (1, 0, . . . , 0)). 
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Note that given an ^ C J and w, the definition gives a linear-time proeedure that 
either finds the unique eonfiguration (tu, n) representing it or decides that no such 
configuration exists. 

Lemma 6. 



(i) Let AC J be a set of jobs and let {w, n) be any configuration representing it. Then 
(1 - S)W{w, n)-Sw< W{A) < W(w, n). 



(ii) Any AC J has a unique principal configuration, and it can be constructed in linear 
time. The number of principal configurations is bounded by (n + 1)^ and they can 
be enumerated efficiently. 

(iii) Let A,A'CJ be both represented by (w, n). Then for any w' > w they are both 
represented by {w' , n') for some n' . 



Proof, (i) Any job j <E A with pj > 6w contributes to W {w, n) some r, pj < r < 
Pj + S'^w. Its contribution to W{A) is pj > r — 6‘^w > r — Spj > (1 — S)r. The small 
jobs j € A, Pj < Sw, can cause a total additive error of at most Sw. Summing over all 
jobs, the bound follows. 

(ii) The principal configuration of A = 0 has w = 0. For a nonempty A, find a job 
j <E A with the largest processing time pj , and round up pj to a power of two to obtain w. 
It follows that there are at most n + 1 possible values of w in the principal configurations. 
To enumerate all principal configurations with a given w, find a representation {w, n) 
of J {w) = {j & J \ Pj < w} and enumerate all the vectors n' bounded by 0 < n' < n 
(coordinatewise), such that n' > 0 for some i > A^/2. (Here we use the fact that A is 
even.) 

(iii) If w = 0 then A = A' = $ and the statement is trivial. Otherwise define 



nl 



W(0,{nx,nx+i,...,n2xP,...,0)) 

26 

n2i-i + ri2i 
0 



for i = X, 

for A < i < A^/2, 
for i > A2/2. 



It is easy to verify that if A is represented by {w, n) then it is represented by (2tu, n+). 
Iterating this operation sufficiently many times proves that the representation with any 
w' > w is the same for both A and A' . 



Next we define a difference of principal configurations and show how it relates to a 
difference of sets. This is essential for our scheme. (It is easy to define difference of any 
configurations, but we do not need it.) 

Definition 7. Let {w, n) and {w', n') be two principal configurations. Their difference 
is defined as follows. First, let {w', n”) be the configuration that represents the same sets 
of jobs as {w, n) (using Lemma 6 (iii)). Now define [w' , n') — {w, n) = {w' ,n' — n"). 
If w' < w, or the resulting vector has some negative coordinate(s), the difference is 
undefined. 



Lemma 8. Let A C J be a set of jobs and {w, n) its principal configuration. 
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(i) Let {w', n') be a principal configuration such that {w', n') — {w, n) is defined. Then 

there exists a set of jobs B represented by {w' , n') such that A C B C J, and it 
can be constructed in linear time. 

(ii) Let B be any set of jobs such that A Q B Q J, and let {w' ,n') be its principal 
configuration. Then 7 = [w' ,n') — {w, n) is defined and B — Ais represented by 
7 or succ( 7 ). Furthermore, ify= (w' ,n') — (w, n) is heavy then the weight of 
B — Ais bounded by \ W {B — A) — W {y)\ < 3SW ( 7 ). 

Proof. Let {w' ,n") be the eonfiguration representing A', it ean be computed from A 
and w' in linear time. 

(i) Since the differenee is defined, n" < n' . For each i, \ < i < )?, add n' — nf jobs 
with processing time pj € ((i — l)S‘^w, iSw]; since {w\ n') is a principal configuration 
we are guaranteed that a sufficient number of such jobs exists. Finally, add jobs with 
Pj < Sw one by one until n" inereases to n^. Each added job increases the coordinate 
by at most 1 , and we have sufficiently many of them since {w' , n' ) is principal. 

(ii) It is easy to see that A Q B implies that w < w' and n” < n' , thus the 

difference is defined. For i, X < i < A^, n[ — n" is the number of jobs in i? — ^ with 
the appropriate processing times. Let Wa and Wb be the weight of jobs in A and B 
with Pj < Sw'. We have n" — 1 < Wa/{Sw') < n" and — 1 < Wb/{Sw') < n^, 
thus n'^ — n" — 1 < {Wb — Wa)/{Sw') < — n" + 1. This is rounded to — n'{ 

or — n" + 1, hence B — A is represented by 7 or 7 '. By Lemma 6 (i) it follows that 

(1 - S)W{y) - Sw' < W{B -A)< W{i) = W{y) + Sw' . 

If 7 is heavy then Sw' < 2SW ( 7 ), and the lemma follows. 

From the part (i) of the lemma it also follows that given a principal configuration, 
it is easy to find a set it represents: just set yl = 0. Thus also the difference can be 
computed in linear time, using the procedure in the definition. 



5 The Approximation Scheme 

Given an e € (0, 1], we choose S using (F=i<) so that A = Ij S is an even integer and 
(Va;,y > 0)(|y-a;| < 3Sx \f{y) - f{x)\ < |/(a;)). 



Definition 9. We define the graph G of configurations as follows. 

The vertices of G are (i, a{A)), for any \ < i < m and any A C J, the source 
vertex (0, o;(0)), and the target vertex {m, a{J)). 

For any i, 1 < i < m, and any configurations a and [3, there is an edge from 
{i — l,a) to (i, j3) iff either f3 = a, or f3 — a is defined and succ(/3 — a) is heavy. The 
cost of this edge is defined as f{W{(3 — a)j sf. There are no other edges. 



Definition 10. Let J\, .... Jm be an assignment of jobs J to machines M\, .... Mm. 
Its representation is a sequence of vertices of G {{i, where cxi = o;(U^=i Ji')- 
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Note that we really obtain vertices of G, as cto = ct(0) = a(J). 

The approximation seheme performs the following steps: 

( 1 ) Order the machines with speeds either non-deereasing or non-increasing, according 
to the type of the problem and / so that by Lemma 4 there exists an optimal sehedule 
with non-decreasing non-zero weights of the machines. 

(2) Construct the graph G. 

(3) Find an optimal path in G from source (0, o;(0)) to (m, a{J)). The cost of the path 
is defined as the sum, maximum or minimum of the eosts of the edges used, and 
an optimal path is one with the eost minimized or maximized, as speeified by the 
problem. 

(4) Output an assignment represented by the optimal path constructed as follows: Whe- 
never the path contains an edge of the form {{i — l,a),{i,a)), put Ji = 0 . For every 
other edge, apply Lemma 8 (i), starting from the beginning of the path. 



Lemma 6 (ii) shows that we can construct the vertices of G in time ). 

Computing the edges of G and their costs is also efficient. Since the graph G is layered, 
finding an optimal path takes linear time in the size of G. Given a path in a graph, finding 
a corresponding assignment is also fast. Flence the complexity of our PTAS is as claimed. 

Lemma 11. 

(i) If{Ji } is an assignment with non-decreasing weights of the machines with non-zero 
weights (cf. Lemma 4), then its representation {{i, o;i)}’Lg is a path in G. 

(ii) Let {Ji] be an assignment whose representation {(i, o;i)}^g is a path in G and 
such that if cii-i = ai then Ji = 0. Let G be the cost of the schedule given by the 
assignment, and let G"^ be the cost of the representation as a path in the graph. 
Then \G-G*\< eC#/3. 

Proof, (i) For any i = 1, . . . , m, if = ai then ((i — l,ai_i), (i, af) is an edge 
by definition. Otherwise by Lemma 8 (ii), the difference j = ai — is defined 
and Ji is represented by 7 or succ(7), and thus W{Ji) < FL(succ(7)). We need to 
show succ(7) is heavy. Let w be the order of ai, and thus also of 7. If tu = 0, the 
statement is trivial. Otherwise some job with pj > w/2 was scheduled on one of the 
machines Mi, . . . , Mi. Since the assignment has non-decreasing weights, it follows that 
w/2 < W{Ji) < FL(succ(7)), and 7 is heavy. 

(ii) Let Xi = W {ai — ai-i) and let Yi = f{Xi/si) be the cost of the ith edge of the 
path. If ai — 1 — ai, then Ji — 0 and j' {G/{ — T) — J' (0). Otherwise by Lemma 8 (11), 
\W{Ji)-Xi\ < 3hXi.Thus \Gi - Xi/si\ = \W{Ji)/si-Xi/si\ < 3SXi/si and by 
the condition (F=i<) and our choice of 6 we get \f{Gi) — Yi\< eYi/3. Summing over all 
edges of the path we get the required bound. 

We now finish the proof of Theorems 1 and 2 for the minimization versions; the 
case of maximization is similar. Let G* be the optimal cost, let G"^ be the cost of an 
optimal path in G, and let G be the cost of the output solution of the PTAS. By Lemma 4 
there exists an optimal schedule with non-decreasing weights. Thus by Lemma 1 1 (i) 
it is represented by a path, which cannot be cheaper than the optimal path, and by 
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Lemma 1 1 (ii) C* < (1 — Using Lemma 1 1 (ii) for the output assignment we 

get 

C <{l + \)C* < < (1 + e)C*. 

3 1 ~ 3 

Thus we have found a required approximate solution. (Note that the output solution need 
not have non-decreasing weights, due to rounding.) 

6 Discussion 

To get more insight in the meaning of the eondition (F*), we prove the following cha- 
racterization for convex functions. We omit the proof. 

Observation 12 Suppose / : [0,-l-oo) — ^ [0,-|-oo) is on (0,-l-oo) convex and not 
identically 0. Then f satisfies (F*) if and only if the following conditions hold: 

- f{x) > 0 for any a: > 0, 

- for X ^ oo, f{x) is polynomially bounded both from above and below (i.e., for 
some constant c, f{x) < 0(xfi and f{x) > [2{l/xfi). 

- forx — ^ 0, f{x) is polynomially bounded both from above and below (i.e., for some 
constant c, f{x) < 0{l/xfi and f{x) > fl{xfi). 

This characterization is related to Conjecture 4. 1 of Alon et al. [2] which we now disprove. 
The conjecture says that for a convex function /, and for the problem of minimizing 
f{Ci) on identical machines the following three conditions are equivalent: (i) it has 
a PTAS, (ii) it has a polynomial approximation algorithm with a finite performance 
guarantee, (iii) the heuristic LPT, which orders the jobs according to non-increasing 
processing times and schedules them greedily on the least loaded machine, has a finite 
performance guarantee. 

We know that if (F =i<) holds, there is a PTAS (for a computable /) [2] . Observation 1 2 
implies that if (F =i<) does not hold, then LPT does not have a finite performance guarantee; 
the proof is similar to Observation 4. 1 of [2] (which says that no such algorithm exists for 
an exponentially growing function, unless P = NP, by a reduction to KNAPSACK). 

Now consider f{x) = where t is some slowly growing unbounded function; 
t{x) = log log log log X will work. It is easy to verify that any such / is convex and does 
not satisfy (F*). Flowever, it is possible to find a PTAS on identical machines using the 
integer programming approach of [2]. The function / does not satisfy (F=i<) on [0, -foo), 
but it satisfies (F=i<) for any interval [0,T], moreover for a fixed e the value of 5 can 
be bounded by e/0{t{T)). The PTAS algorithm now proceeds in the following way. It 
computes the bound on the completion time T as the sum of all processing times and 
chooses S and A = 1/h accordingly. Since M is at most singly exponential in the size 
of the instance, A is proportional to a triple logarithm of the instance size. Now we use 
the integer programming approach from [2]. Resulting algorithm has time complexity 
doubly exponential in A, which is bounded by the size of the instance. Thus the algorithm 
is polynomial. 

Let us conclude by a few remarks about the problem of minimizing max/(C'i). 
It is easy to approximate it for any increasing / satisfying (F*): just approximate the 
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minimum makespan, and then apply / to that. Thus our extension to bimodal functions is 
not very strong. However, our teehniques apply to a wider range of functions /. Suppose 
for example that the function / is increasing between 0 and 1, and then again from 
between c and +oo, with an arbitrary behavior between 1 and c. Then it is possible to 
prove a weaker version of Lemma 4, saying that for some (almost) optimal schedule, for 
any i < j ,Ti < jiTj (if Tj > 0). The constant /x will depend only on the function /. 
This is sufficient for the approximation scheme, if we redefine the heavy edges to be the 
ones with weight /iw/2 rather than w/2, and choose the other constants appropriately 
smaller. We omit the details and precise statement, since this extension of our results 
does not seem to be particularly interesting. 



7 Scheduling with Rejection 

In this section we study the problem from Example 4 in the introduction. 

Theorem 13. Let f be a non-negative computable function satisfying the conditions (F*) 
and (G*). Then the problem of scheduling with rejection on uniformly related machine 
withthe objection to minimize the sum of weights of rejected jobs plus f{Ci) possesses 

a polynomial approximation scheme. 

Let f be a non-negative computable bimodal function satisfying the condition (F =i<). 
Then the problem of scheduling with rejection on uniformly related machines with the 
objection to minimize the sum of weights of rejected jobs plus max/(C'i) possesses a 
polynomial approximation scheme. 

If the machines are identical, then the same is true (in both cases above) even if f is 
computable and satisfies only the condition (F*). 

The proof is a modification of our general PTAS. We give only a brief sketch. We 
start with the first case, i.e., the objective is penalty plus f{Ci). 

We modify the graph G used in our PTAS in the following way. We add n auxiliary 
levels between any two levels of the original graph, as well as after the last level. Each 
level will again have nodes corresponding to all principal configurations, the target node 
will now be the node a{J) on the last auxiliary level. The edges entering the original 
nodes and their values will be as before. The edges entering the auxiliary levels will be as 
follows. There will be an edge from a configuration {w, n) to {w' , n') iff the following 
holds: w < w' , {w' , n") = {w' , n') — {w, n) is defined, and n" = 0 for all i < A^/2. 
(The last condition says that {w' ,n" ) represents only sets of jobs with all processing 
times greater than w' / 2.) The value of the edge will be the smallest total penalty of a 
set of jobs represented by (ru', n"). Additionally, there will be edges between identical 
configurations, with weight 0. 

The proof that the cost of the shortest path is a good approximation of the optimum 
is along the same lines as for our general PTAS. The rest is omitted. 

The case of minimizing the penalty plus max/(C'i) is somewhat different. The 
obstacle is that the cost of a path in the graph should be sum of the costs of edges on 
certain levels plus the maximum of the costs of edges on the remaining levels; for such 
a problem we are not able to use the usual shortest path algorithm. 
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Let M be some bound on max f{Ci). We use a similar graph as above, with the 
following modifieation. We inelude an edge entering an original node only if its value 
would be at most M; we set its value to 0. Now the cost of the shortest path is an 
approximation of the minimal penalty among all schedules with max /(Cj) < M. More 
preeisely, similarly as in Lemma 1 1, if there is a schedule with max/(C'i) < (1— e/3)M 
and total penalty P, the shortest path has eost at most P; on the other hand, from a path 
with cost P we may construct a schedule with max f{Ci) < (1 + e/3)M and penalty 
P. 

Now we solve the optimization problem by using the procedure above polynomi- 
ally many times. Let Smin and Smax be the smallest and the largest machine speeds, 
respectively. Let Prnin be the minimal processing time of a job, and let T be the total 
processing time of all jobs. In any schedule, any non-zero completion time is between 
b — Pminismax and B — Tjsmin- Now wc cycle through all values x = {1 + Syb, 
i = 0, 1, . . ., such that x < B] the constant S is chosen by the condition (F*), as in 
Section 5. In addition, we consider a: = 0. The number of such x is polynomial in the 
size of the number B/b, which is polynomial in the size of the instance. For each x, 
we compute M = f{x), and find a corresponding schedule with the smallest penalty P 
by the procedure above. (As a technical detail, we have to round each a; to a sufficient 
precision so that the length of x is polynomial in the size of the instance; this is possible 
to do so that the ratio between successive values of x never exceeds 1 + 26, and that 
is sufficient.) We chose the best of these schedules, and possibly the schedule rejecting 
all jobs. Since the relative change between any two successive non-zero value of x is at 
most 3d, the relative change between the successive values of M is at most e/3 (by our 
choice of 6 using (F =i<)), and we cover all relevant values of M with sufficient density. 
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Abstract. In this paper we consider the temporary tasks assignment problem. In 
this problem, there are m parallel machines and n independent jobs. Each job has 
an arrival time, a departure time and some weight. Each job should be assigned 
to one machine. The load on a machine at a certain time is the sum of the weights 
of jobs assigned to it at that time. The objective is to find an assignment that 
minimizes the maximum load over machines and time. 

We present a polynomial time approximation scheme for the case in which the 
number of machines is fixed. We also show that for the case in which the number 
of machines is given as part of the input (i.e., not fixed), no algorithm can achieve 
a better approximation ratio than | imless P = NP. 



1 Introduction 

We consider the off-line problem of non-preemptive load balancing of temporary tasks 
on m identical machines. Each job has an arrival time, departure time and some weight. 
Each job should be assigned to one machine. The load on a machine at a certain time 
is the sum of the weights of jobs assigned to it at that time. The goal is to minimize 
the maximum load over machines and time. Note that the weight and the time are two 
separate axes of the problem. 

The load balancing problem naturally arises in many applications involving alloca- 
tion of resources. As a simple concrete example, consider the case where each machine 
represents a communication channel with bounded bandwidth. The problem is to assign 
a set of requests for bandwidth, each with a specific time interval, to the channels. The 
utilization of a channel at a specific time t is the total bandwidth of the requests, whose 
time interval contains t, which are assigned to this channel. 

Load balancing of permanent tasks is the special case in which jobs have neither 
an arrival time nor a departure time. This special case is also known as the classical 
scheduling problem which was first introduced by Graham [5,6]. He described a greedy 
algorithm called “List Scheduling” which has a 2 — ^ approximation ratio where m is 
the number of machines. Interestingly, the same analysis holds also for load balancing 
of temporary tasks. However, until now, it was not known whether better approximation 
algorithms for temporary tasks exist. 

* * * Research supported in part by the Israel Science Foundation and by the US-Israel Binational 
Science Foundation (BSF). 
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For the special case of permanent tasks, there is a polynomial time approximation 
scheme (PTAS) for any fixed number of machines [6,10] and also for arbitrary number 
of machines by Hochbaum and Shmoys [7]. That is, it is possible to obtain a polynomial 
time (1 + fc) -approximation algorithm for any fixed e > 0. 

In contrast we show in this paper that the model of load balancing of temporary tasks 
behaves differently. Specifically, for the case in which the number of machines is fixed 
we present a PTAS. However, for the case in which the number of machines is given as 
part of the input, we show that no algorithm can achieve a better approximation ratio 
than I unless P = NP. 

Note that similar phenomena occur at other scheduling problems. For example, for 
scheduling (or equivalently, load balancing of permanent jobs) on unrelated machines, 
Lenstra et al. [9] showed on one hand a PTAS for a fixed number of machines. On the 
other hand they showed that no algorithm with an approximation ratio better than | for 
any number of machines can exist unless P = NP . 

In contrast to our result, in the on-line setting it is impossible to improve the perfor- 
manee of Graham’s algorithm for temporary tasks even for a fixed number of maehines. 
Specifically, it is shown in [2] that for any m there is a lower bound of 2 — ^ on the 
performance ratio of any on-line algorithm (see also [1,3]). 

Our algorithm works in four phases: the rounding phase, the combining phase, the 
solving phase and the converting phase. The rounding phase actually consists of two 
subphases. In the first subphase the jobs’ active time is extended: some jobs will arrive 
earlier, others will depart later. In the second subphase, the active time is again exten- 
ded but each job is extended in the opposite direction to which it was extended in the 
first subphase. In the combining phase, we combine several jobs with the same arrival 
and departure time and unite them into jobs with higher weights. Solving the resulting 
assignment problem in the solving phase is easier and its solution can be converted into 
a solution for the original problem in the converting phase. 

The novelty of our algorithm is in the rounding phase. Standard rounding techniques 
are usually performed on the weights. If one applies similar techniques to the time the 
resulting algorithm’s running time is not polynomial. Thus, we had to design a new 
rounding technique in order to overcome this problem. 

Our lower bound is proved directly by a reduction from exact cover by 3-sets. It 
remains as an open problem whether one can improve the lower bound using more 
sophisticated techniques such as POP reductions. 



2 Notation 



We are given a set of n jobs that should be assigned to one of m identical machines. We 
denote the sequence of events by c = cti, ...,cr 2 n, where each event is an arrival or a 
departure of a job. We view c as a sequence of times, the time Ui is the moment after the 
event happened. We denote the weight of job j by wj, its arrival time by Oj and its 
departure time by dj . We say that a job is active at time r if Oj < r < dj . An assignment 
algorithm for the temporary tasks problem has to assign each job to a machine. 
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Let Qi = {j|a j ^ O' i < dj} be the active jobs at time (7^. For a given algorithm A 
let Aj be the machine on which job j is assigned. Let 

= X) Wj 
{j\Aj=k,jeQi} 



be the load on machine k at time ai, which is the sum of weights of all jobs assigned to 
k and active at this time. The cost of an algorithm A is the maximum load ever achieved 
by any of the machines, i.e., Ca = maxi^klki^)- compare the performance of an 
algorithm to that of an optimal algorithm and define the approximation ratio of as r 
if for any sequence Ca < r ■ Copt where Copt is the cost of the optimal solution. 



3 The Polynomial Time Approximation Scheme 

Assume without loss of generality that the optimal makespan is in the range (1,2]. That is 
possible since Graham’s algorithm can approximate the optimal solution up to a factor 
of 2, and thus, we can scale all the jobs’ weights by | where I denotes the value of 
Graham’s solution. 

We perform a binary search for the value A in the range (1, 2]. For each value we 
solve the (1 + t) relaxeddecisionproblem, that is, either to find a solution of size (l + t)A 
or to prove that there is no solution of size A. From now on we fix the value of A. 
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Fig. 1. Partitioning J into {Ji} 



In order to describe the rounding phase with its two subphases we begin with defining 
the partitions based on which the rounding will be performed. We begin by defining a 
partifion {Ji} of fhe set of jobs J. Let Mi be a set of jobs and consider the sequence of 
times a in which jobs of Mi arrive and depart. Since the number of such times is 2r for 
some r, let Ci be any time between the r-th and the r + 1-st elements in that set. The set 
Ji contains the jobs in Mi that are active at time Ci. The set M 2 i contains the jobs in Mi 
that depart before or at Ci and the set M 2 i+i contains the jobs in Mi that arrive after Ci. 
We set Ml = J and define the M’s iteratively until we reach empty sets. The important 
property of that partition is that the set of jobs that exist at a certain time is partitioned 
into at most [log n] different sets Ji. 
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Fig. 2. Partitioning Ji into {S ^ , T/ } (Ri is not shown) 



We continue by further partitioning the set We separate the jobs whose weight 
is greater than a certain constant a and denote them by Ri. We order the remaining 
jobs according to their arrival time. We denote the smallest prefix of the jobs whose 
total weight is at least a by S'/. Note that its total weight is less than 2a. We order the 
same jobs as before according to their departure time. We take the smallest suffix whose 
weight is at least a and denote that set by T/ . Note that there might be jobs that are both 
in Sj and T/. We remove the jobs and repeat the process with the jobs left in Ji and 
define Sf,T^,..., S/’ , T^' . The last pair of sets S/’ and T^' may have a weight of less 
than a. We denote by sj the arrival time of the first job in S/ and by tj the departure 
time of the last job in T/. Note that s| < sf < ... < s/’ < Ci < t/’ < ... 

The first subphase of the rounding phase creates a new set of jobs J' which contains 
the same jobs as in J with slightly longer active times. We change the arrival time of all 
the jobs in Sj for j = l,...,ki to sj . Also, we change the departure time of all the jobs 
in Til to t\ . The jobs in Ri are left unchanged. We denote the sets resulting from the first 
subphase by J', J/, S'{, T'{. 




Fig. 3. The set (after the first subphase) 



The second subphase of the rounding phase further extends the active time of the 
jobs of the first subphase. We take one of the sets J' and the partition we defined earlier 
to Ri, S'l U T'/, S"/ U T'l, ..., S'i' U T'/’ . We order the jobs in S'j according to an 
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increasing order of departure time. We take the smallest prefix of this ordering whose 
total weight is at least /?. We extend the departure time of all the jobs in that prefix to 
the departure time of the last job in that prefix. The process is repeated until there are 
no more jobs in S'\ . The last prefix may have a weight of less than /?. Similarly, extend 
the arrival time of jobs in T'l. The jobs in Ri are left unchanged. We denote the sets 
resulting from the second subphase by J" , J" , S"j, T"j. 




Fig. 4. The set J" (after the rounding phase) 



The combining phase of the algorithm involves the weight of the jobs. Let J'j^ be the 
set of jobs in J” that arrive at s and depart at t. Assume the total weight of jobs whose 
weight is at most 7 in J'J^ is aj. The eombining phase replaees these jobs by [a] jobs of 
weight 7. We denote the resulting sets by J"/. The set J'” is created by replacing every 
with its eorresponding J"/, that is, J'" = Us<t J"/. 

The solving phase of the algorithm solves the modified decision problem of J'” by 
building a layered graph. Every time Ui € cr in whieh jobs arrive or depart has its own set 
of vertiees called a layer. In each layer we hold a vertex for every possible assignment of 
the current active jobs to maehines; that is, an assignment whose makespan is at most A 
for a certain A. Two vertiees of adjacent layers are connected by an edge if the transition 
from one assignment of the aetive jobs to the other is consistent with the arrival and 
departure of jobs at time Now we can simply check if there is a path from the first 
layer to the last layer. 

In the eonverting phase the algorithm converts the assignment found for J'" into an 
assignment for J. Assume the number of jobs of weight 7 in J'J[ that are assigned to a 
certain machine i is r^. Replaee these with jobs smaller than 7 in of total weight of 
at most {vi + 1)7. Note that all the jobs will be assigned that way since the replacement 
involves jobs whose weight is at most 7 and from volume consideration there is always 
at least one machine with a load of at most of these jobs. The assignment for J” is 
also an assignment for J' and J. 

4 Analysis 

Lemma 1 . For a = 2 [log n] ’ solution whose makespan is A to the original 

problem J, the same solution applied to J' has a makespan of at most A + e. Also, given 
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a solution whose makespan is A to J', the same solution applied to J has a makespan 
of at most A. 

Proof. The second claim is obvious since the jobs in J are shorter than their perspective 
jobs in J' . As for the first claim, every time r is contained in at most [logn] sets Ji. 
Consider the added load at r from jobs in a certain set If r < or r > t\ then 
the same load is caused by J' and Ji. Assume t < Ci and define = Ci, the other 
case is symmetrical. Then for some j, sj < r < and the added load at r is at most 
the total load of Si which is at most 2 a. Summing on all sets Ji, we conclude that the 
maximal load has increased by at most 2 a [log n] = e. 



2 

Lemma 2. For f given a solution whose makespan is A to the problem 

J', the same solution applied to J” has a makespan of at most A (1 + t). Also, given a 
solution whose makespan is A to J” , the same solution applied to J' has a makespan of 
at most A. 

Proof. The second claim is obvious since the jobs in J' are shorter than their perspective 
jobs in J" . As for the first claim, given a time r and a pair of sets S'l, T'l from J' 
we examine the increase in load at r. If T < S>i or r > tl it is not affected by the 
transformation because no job in T'l U S' I arrives before s[ or departs after tl . Assume 
that T < Ci, the other case is symmetrical. So r is affected by the decrease in arrival time 
of jobs in T'l . It is clear that the way we extend the jobs in T'j increases the load at r by 
at most /?. Also, since t > sj, we know that the load caused by S'j is at least a ifj < ki. 
Thus, an extra load of at most /? is created by every pair S'j , T'j for I <j < k i only 
if the pair contributes at least a to the load. Also, Si' for all i contributes an extra load 
of at most /? [log n] . Since the total load on all machines at any time is at most Am, the 
increaseinloadand therefore inmakespanisatmost ^Am+/?[logn] = ^ + ^ < eA. 

Lemma 3. For J = ^ = 4^2 [ipg „-| , given a solution whose makespan is A to the 
problem J" , the modified problem J'" has a solution with a makespan o/A(l + 1). Also, 
given a solution whose makespan is A to the modified problem J'" , the solution given 
by the converting phase for the problem J" has a makespan of at most A (1 + e). 

Proof. Consider a solution whose makespan is A to J" . If the load of jobs smaller than 7 
in a certain J'f on a certain machine i is r^7, we replace it by at most \vi\ jobs of weight 
7. Note that this is an assignment to J'" and that the increase in load on every machine 
is at most 7 times the number of sets J'f that contain jobs which are scheduled on that 
machine. As for the other direction, consider a solution whose makespan is A to J'" . 
The increase in load on every machine by the replacement described in the algorithm 
is also at most 7 times the number of sets J'f that contain jobs which are scheduled on 
that machine. 

The number of sets J'f that can coexist at a certain time is at most ^ since the 
weight of each set is at least /? and the total load at any time is at most Am. Therefore, 
the increase in makespan is at most 7^ = eA. 
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Lemma 4. The running time of the algorithm for solving the relaxed deeision problem 
for A is bounded by log me running time of the PTAS is the above 

bound times 0 (log 1 / e). 

Proof. Every layer in the graph stores all the possible assignments of jobs to machi- 
nes. Since the smallest job is of weight 7 , the maximum number of active jobs at 
a eertain time is So, the maximum number of edges in the graph is nm^^~ 
and the running time of the relaxed decision problem algorithms is 0{nm^^~) = 
riognl^^ _ (7 ^^16 Am^ log me ^+ 1 ^^ jjjg running time of the PTAS is the 
above bound times 0 (log 1 /e) sinee there are 0 (log 1 /e) phases in the binary search 
for the appropriate A. 

5 The Unrestricted Number of Machines Case 

In this section we show that in case the number of machines is given as part of the 
input, the problem cannot be approximated up to a factor of 4/3 in polynomial time 
unless P = NP. We show a reduetion from the exact cover by 3-sets (X3C) whieh 
is known to be NP-complete [4,8]. In that problem, we are given a set of 3n elements, 
A = {oi, 02 ,..., osn}, and a family F = {Ti, ..., T^} of m triples, F C A* A* A. Our 
goal is to find a eovering in F, i.e. a subfamily F' for which |F' | = n and Uxi eF'Ti = A. 

Given an instance for the X3C problem we construct an instance for our problem. 
The number of machines is m, the number of triples in the original problem. There are 
three phases in time. First, there are times 1, ..., m, eaeh corresponding to one triple. 
Then, times m + 1, ..., m + 3n eaeh corresponding to an element of A. And finally, the 
two times m + 3n + 1, m + 3n + 2. 



TfiHiT Typ(^[3 Tyjx>]Ti 1^9 
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Fig. 5. An assignment for the scheduling problem corresponding to m = 4, n = 2, F = 
{(1,2, 3),(1,4, 5), (4,5,6), (2,3,4)} 

There are four types of jobs. The first type are m jobs of weight 3 starting at time 0. 
Job r, 1 < r < m ends at time r. For any appearance of Oj in a triple U corresponds a 
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job of the second type of weight 1 that starts at i and ends a.t m + j and another job of 
the third type of weight 1 that starts at time m + j. Among all the jobs that start at time 
m + j, one ends at m + 3n + 2 while the rest end at m + 3n + 1. The fourth type of 
jobs are m — n jobs of weight 3 that start at m + 3n + 1 and end at m + 3n + 2. 

We show that there is a schedule with makespan at most 3 if and only if there is an 
exact cover by 3-sets. Suppose there is a cover. We schedule a job of the first type that 
ends at time i to machine i. We schedule the three jobs of the second type corresponding 
to Ti to machine i. At time m + j, some jobs of type two depart and the same number 
of jobs of type three arrives. One of these jobs is longer than the others since it ends at 
time m + 3n + 2. We schedule that longer job to machine i where Tj is the triple in 
the covering that contains j. At time m + 3n + 1 many jobs depart. We are left with 
3n jobs, three jobs on each of the n machines corresponding to the 3-sets chosen in the 
cover. Therefore, we can schedule the m — n jobs of the fourth type on the remaining 
machines. 

Now, assume that there is a schedule whose makespan is at most 3. One important 
property of our scheduling problem is that at any time r, 0<r<m + 3n + 2 the total 
load remains at 3m so the load on each machine has to be 3. We look at the schedule 
at time m + 3n + 1. Many jobs of type three depart and only the long ones stay. The 
number of these jobs is 3n and their weight is 1. Since m — n jobs of weight 3 arrive 
at time m + 3n + 1, the 3n jobs must be scheduled on n machines. We take the triples 
corresponding to the n machines to be our covering. Assume by contradiction that this is 
not a covering. Therefore, there are two 3-sets that contain the same element, say Oj. At 
time m + j only one long job arrives. The machine in which a shorter job was scheduled 
remains with a load of 3 until time m + 3n + 1 and then the short job departs and its 
load decreases to at most 2. This is a contradiction since at time m + 3n + 1 there are 
n machines each with 3 long jobs. 

Corollary 5. For every p < |, there does not exist a polynomial p- approximation 
algorithm for the temporary tasks assignment problem unless P = NP. 
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Abstract. In parallel computation we often need an algorithm for dividing one 
computationally expensivejob into a fixed number, say N, of subjobs, which can be 
processed in parallel (with reasonable overhead due to additional communication). 
In practice it is often easier to repeatedly bisect jobs, i.e., split one job into exactly 
two subjobs, than to generate N subjobs at once. In order to balance the load among 
the N machines, we want to minimize the size of the largest subjob (according to 
some measure, like cpu-time or memory usage). 

In this paper we study a recently presented load balancing algorithm, called Hea- 
viest First Algorithm (Algorithm HF), that is applicable to all classes of problems 
for which bisections can be computed efficiently. This algorithm implements a very 
natural strategy: During N — 1 iterations we always bisect the largest subproblem 
generated so far. 

The maximum load produced by this algorithm has previously been shown to differ 
from the optimum only by a constant factor even in the worst-case. In this paper 
we consider the average-case, assuming a natural and rather pessimistic random 
distribution for the quality of the bisections. Under this model the heaviest load 
generated by Algorithm HF is proved to be only twice as large as the optimum 
with high probability. Furthermore, our analysis suggests a simpler version of 
Algorithm HF which can easily be parallelized. 



1 Introduction 

Dynamic load balancing for irregular problems is a major research issue in the eontext of 
parallel and distributed computing. Often it is essential to achieve a balanced distribution 
of load in order to reduce the execution time of an applieation or to maximize system 
throughput. 

We consider a general scenario, where an irregular problem is generated at run-time 
and must be split into subproblems that can be processed on different processors. If N 
processors are available, the problem is to be split into N subproblems and the goal is to 
balance the weight of the subproblems assigned to the individual processors. The weight 
of a problem (or subproblem) represents the load (CPU load, for example) caused by 
the problem on the processor to which it is assigned. It is assumed that the weight of a 
problem can be ealeulated (or approximated) easily once it is generated. 

Instead of splitting the original problem into N subproblems in a single step, we 
use repeated bisections for generating the N subproblems, because in practice it is 
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often easier to find effieient bisection methods than to find a method for splitting a 
problem into an arbitrary number of subproblems in one step. A bisection subdivides 
a problem into exactly two subproblems. Optimization problems in planar graphs (and 
their generalizations) are a classical field where a similar approach is commonly used 
(see for example [LT79, AST90]), and which may be adapted to fit into this model. 

Load balancing using bisectors can and has been applied to a variety of practical 
problems. Our research is motivated by an application from distributed numerical simu- 
lations, namely a parallel solver for partial differential equations using a variant of the 
finite element method (FEM) with adaptive, recursive substructuring [HS94, BEE98]. 
Other possible application domains include chip layout [RR93] or multi-dimensional 
adaptive numerical quadrature [Bon93]. 

In this paper, we will not be concerned with details regarding any particular appli- 
cation. Instead, we study parallel load balancing from a more abstract point of view. We 
only assume that bisectors can be computed efficiently for the problems under conside- 
ration. 

The remainder of the paper is organized as follows. In Sect. 2 we introduce our load 
balancing model and briefly review previous results regarding the worst-case behavior of 
Algorithm HF. In Sect. 3 we present our main result and the theoretical analysis. Finally, 
Sect. 4 contains some concluding remarks. 

2 Load Balancing Model and Previous Work 

As in [BEE98], we study the following simplified model for dynamic load balancing. 
The parallel system, on which we want to solve a problem p, consists of N processors or 
machines. The goal of the load balancing is to split p into N subproblems pi, ... ,pn, 
which can be solved individually. In our analysis we neglect the overhead which results 
from the combination of the partial solutions to the solution of the initial problem. This 
simplification is justified if the bisections produce only loosely coupled subproblems. 
Assuming a weight function w that measures the resource demand (for example, CPU 
load or memory requirement, depending on the application at hand) of subproblems, 
the goal is to minimize the maximum weight among the resulting subproblems, i.e., to 
minimize maxi<i<7v w(pj). 

As mentioned before, we assume that the load balancing algorithm can only split a 
subproblem into two subproblems in a single step, using a bisection method. Repeated 
bisections must then be used to split a problem into N subproblems. If problem q is 
bisected into subproblems qi, q 2 we assume that w{q) = w(gi) + w(g 2 ), he., the sum 
of the weights of the subjobs equals the weight of the initial job. 

In general, it can not be expected that bisections always split a problem p of weight 
w(p) into two subproblems with weight w(p)/2 each. For many classes of problems, 
however, there are bisection methods that guarantee that the weights of the two obtained 
subproblems do not differ too much. The following definition captures this concept more 
precisely. 

Definition 1. Let 0 < a < ^. A class V of problems with weight function w : P — ^ K+ 
has a-bisectors if every problem p eV can be efficiently divided into two problems p\ e 
V andp 2 € V with w(pi) + w(p 2 ) = w(p) and w(pi), w(p 2 ) € [aw{p); (1 — a)w{p)]. 
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algorithm HF(p,AT) 
begin 

P ■■= M; 

while |P| < N Ao 
begin 

q := a problem in P with maximum weight; 
biseet q into qi and 52; 

P := {PU{qi,q2}) \ {q}; 

end; 
return P; 



end. 



Fig. 1. Algorithm HF (Heaviest Problem First) 



Fig. 1 shows Algorithm HF, which receives a problem p and a number N of processors 
as input and divides p into N subproblems by repeated bisection of the heaviest remaining 
subproblem. There is also an efficient parallel version of this algorithm as shown in 
[BEE99]. 

A perfectly balanced load distribution on N processors would be achieved if a pro- 
blem p of weight w(p) was divided into N subproblems of weight exactly w{p)/N 
each. 

The following theorem shows that the worst-case behavior of Algorithm HF differs 
from the optimum only by a factor depending on a but not on N. 



Theorem 2. [BEE98] Let V be a class of problems with weight funetion w : P — ^ K+ 
that has a-bisectors. Given a problem p e V and a positive integer N, Algorithm HF 
uses N — 1 bisections to partition p into N subproblems denoted by p\, .... pn such 



that 



max w(»i) < 
l<i<7V ^ ~ 



w(p) 

N 



where Va 



1 

a 






Note that Va is equal to 2 for a > 1/3, below 3 for a > 1—1/ « 0.159, and 
below 10 for a > 0.04. Hence, Algorithm HF achieves provably good load balancing 
for classes of problems with a-bisectors for a surprisingly large range of a. 



3 Average-Case Analysis of Algorithm HF 

3.1 Main Result 

The following stochastic model for an average-case scenario that may arise from practical 
applications seems reasonable: If a problem q is bisected into qi and q 2 we can find a 
bisection parameter a e [0, |] such that w(gi) = aw{q) and w(g 2 ) = (1 — 

Assume that the actual bisection parameter a is drawn uniformly at random from the 
interval [d,/?], 0 < a < (3 < 1/2, and that all — 1 bisection steps are independent 
and identically distributed. We will write a U[a,f3] if a is uniformly distributed on 
[a, $]. In this paper, we study the case d = 0, /3 = which seems rather pessimistie 
for practical applications, where in many cases a-bisectors (for a > 0) are available. 
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Note that the assumption a ^ [/[O, |] could he changed to a ^ U (0, |] throughout our 
analysis without any modifications in the proofs. 

Using this model we obtain the following result: 



Theorem 3. If Algorithm HF starts with a problem of weight W and produces a set P 
of N subproblems after N — 1 bisections then for s = 9'\/ln{N)/N 



Pr 



W 

{2-s)-<W, 



■N 

max 



<(2 + e ) 



W 



= 1 - o(l/iV) , 



where := max{w(p) : p G P} denotes the weight of the heaviest subproblem 

in P. 



Thus, with high probability the maximum load differs from the best attainable value 
^ only by a factor of roughly two. 



3.2 Outline of the Analysis 

For our analysis we assume that Algorithm HF executes infinitely many iterations of the 
loop and consider the infinite bisection tree (IBT) generated by this process. This tree 
grows larger with each iteration of Algorithm HF. There is a one-to-one mapping from 
the subproblems produced by Algorithm HF to the nodes in the IBT. At a every point 
in time the subproblems in P correspond to the leaves in the part of the IBT which has 
been generated so far. The root is the initial problem p. If a node/subproblem q is split 
two new nodes/subproblems q\ and q 2 are appended to q and, thus, the IBT is an infinite 
binary tree. One can also imagine that the IBT exists a priori and Algorithm HF visits 
all nodes one by one. When we say that Algorithm HF visits or expands a node in the 
IBT we adopt this view on the model. The IBT is the infinite version of the bisection 
tree (BT) employed in [BEE98]. 

Now let’s define the probability space more formally: We set 17 := [0, 1]^ with the 
uniform distribution. As usual in the continuous case, the set of events is restricted to 
the Borel cr-field on [0, 1]^, which we denote by F. 

An element s := {ai,a 2 , • • •) € 17 has the following interpretation: ai P[0, 1] 
corresponds to the relative weight of the, say, left successor of the node which is split by 
the i-th bisection. We do not consider the actual bisection parameter (which is equal to 
ai or (1 — af), since studying U[0, 1] instead of P[0, |] simplifies some calculations 
and we do not care which successor node is heavier. We call ai the generalized bisection 
parameter of bisection i. Note that for every s G 17 the corresponding IBT can be 
computed by simulating Algorithm HF. 

We call a node v in the IBT d-heavy iff w(u) > d, and accordingly we use the term 
d-light. If the value of d is obvious from the context we just say heavy and light for short. 

For every node v in the complete binary tree T we define the random variables 
(for d > 0) 

jjd r 1 if w(u) > d 

“ 1 0 otherwise 
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Since T is a superset of the nodes in any IBT, the random variable := 
counts the number of d-heavy nodes in the IBT. If a node v is not part of the IBT, is 
zero. 

Additionally, we define for each v £ T a random variable X^. If v is part of the IBT 
Xy denotes the weight of v relative to its ancestor p in the tree, i.e., Xy := w(u)/w(p). 
Thus, Xy corresponds to the generalized bisection parameter a and, as mentioned above, 
we will assume that Xy C/[0, 1]. We won’t be interested in the value of Xy when v is 
not part of the IBT, so we don’t specify how Xy behaves in this case. 

The random variables directly correspond to the performance of Algorithm HF, 
since H'^ equals the number of iterations after which the set P of processes generated 
by Algorithm HF contains only light nodes. This is due to the fact that Algorithm HF 
visits all d-heavy nodes before it expands any d-light node. 

Our analysis proceeds as follows: First we show that the expected number of heavy 
nodes E[id“*] is comparatively small. Then we prove that is sharply concentrated 
around E[id“*] and thus H'^ is small with high probability. Finally, the desired results for 
Algorithm HF follow easily. 



3.3 Expected Number of Heavy Nodes 

If we look at a node vi on level I in the IBT (level 0 contains only the root, which has 
weight W, level i contains all nodes at BFS-distance i) and denote its ancestors on the 
levels 0,. . . ,l — 1 by uo, . . . , vi-i we get 

i 

y^{vi) = W -\{Xy^ . 
i=l 

The following lemma enables us to analyze the distribution of products of U[0, 1]- 
distributed random variables exaetly. 



Lemma 4. Let Xi, . . . , X„ be independent random variables with the exponential dis- 
tribution andW^Xi] = ~ E[X„] = 1/A. Then for X := X^r=i 



Pr[X < t] 



-1 (At)* At 



-1 1 (AT/ 

^ Z^z=0 i! 

0 



o— At 



Eoo {xty 
i=n i! 



ift > 0 
otherwise 



Proof. See [Fel71,p. 11]. 



It is easy to show (see [Fel7I, pp. 25]) that a random variable Xj U[0, 1] can 
be transformed to a random variable Yi := — InX^ with exponential distribution and 
E[l^] = 1. If we want to analyze X := YVi=i^i we can equivalently analyze Y := 
— In X := Yi. Combining this fact with Lemma 4 yields the following lemma. 

Lemma 5. Let X := Y\a=i product of independent random variables with 

Xi C/[0, 1]. If we define Y as Y := — In X we obtain for t G [0, 1] 



Pr[X >t] = Pi[Y <-lnt] = t-y2 



i=n 
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Proof. See [Fel71, p. 25]. 

In the following discussion we will often be concerned with the probability pf that 
an arbitrary node on level I in the IBT is d-heavy. 

Lemma 6. Let vi be an arbitrary node on level I of the IBT and denote the ancestors of 
vi in the tree by vq, ■ ■ ■ , vi-i. If the root of the IBT has weight W (W > d for a fixed 
d> Q) then for the probability 

pf := Pr [w{vi) > d] 

the following holds (with I > \n{W/d) and I > 1): 

^ (ln{W/d)y ^ d f ln{W/d)-e y 

IP ^ i\ - IP V ^ / ’ 

i=l ^ 



Proof. Since w(iii) is the product of W and the generalized bisection parameters of the 
ancestors of Vi in the IBT we obtain using Lemma 5 



pf = Pr 



VP • n > d 



2 = 1 



Pr 






,2 = 1 



d 



d ^{ln{W/d)Y 

i=l 



This proves the first equation. The second inequality can be shown as follows: 

I ^ (ln(IP/d))* _ (ln(IP/d))' ^ (ln(IP/d))* 



^ (ln(IP/d))" ^ (ln(IP/d))' • ^ ^ ^ 



i=l 



i=0 
I 






1 ]i+l f. 

< (ln(ir/<i))' p 2 ^ < (In(WVd))- . p . 



Now we are in a position to state our first main result, namely the expected value for 
the number of heavy nodes in the IBT. 



Theorem 7. For an IBT where the root has weight W with W > d, it holds that 

E[H^] =‘^W-l . 

Proof. Let T{1) denote the set of nodes on level I in the complete binary tree. For a 
single node v e T{1) we obtain 

E[H^] = pf ■ Pr[u e IBT] = pf , 

because Pr[u € IBT] = 1. We just sketch how this can be shown: If u ^ IBT we 
can find a node v' € IBT which is never bisected. This implies that there are infinitely 
many nodes in the tree with weight at least d' := w{v'). Furthermore, we observe 
that sequences s G [0, 1]^ where only finitely many components belong to [e, 1 — e] 
(for 0 < £ < 0.5) occur with probability zero. If, on the other hand, infinitely many 
components belong to [e, 1 — e] there are only finitely many d' -heavy nodes in the tree. 
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The theorem follows by linearity of expectation using the value of pf which we 
derived in Lemma 6. 



E E E[ff.i = E2'^pf 

1=0 v€T{1) 1=0 



d (ln(fL/ d)Y _ ^ Y^Y^p* (ln(fL/ d)Y 

;=0 i=l i=0 1=0 



_d_ f. (ln(iy/d))- d 

VL ^ i! ^ ’ W 

2=0 




w 

d 



= -VL- 1 . 
d 



3.4 Sharp Concentration 

Theorem 7 gives us an idea how good Algorithm HF performs. If we set d = 2FL/iV we 
get E[iT“*] = N— 1. This is exactly the number of bisections that are necessary to produce 
N subproblems. Thus, on the average, we have bisected every heavy subproblem after 
— 1 iterations of Algorithm HF, and all the N generated subproblems have weight 
smaller than 2WjN, which exceeds the optimal value W/N only by a factor of two. In 
the following part we will show that Algorithm HF really behaves the way this intuitive 
argument suggests. This is due to the fact that is sharply concentrated around its 
expectation. 

The following lemma shows that with high probability all heavy nodes reside rather 
close to the root of the IBT. 



Lemma 8. With probability 1 — o(l/iV) all nodes of the IBT on level I with I > k ■ 
\n{{W N) / c) and k > 2e are {c/N)-light (for W > c/N and c > 0/ 

Proof. Setting I := k ■ ln((FL iV)/c) for A: > 1 such that I € N and applying Lemma 6 
for d = c/N, we obtain 



Pr [3 d-heavy node on level 1] 

ok^<{WN)/c) . 

- WN 



^^k\^{{WN)/c) 



r> \ 1+A: In A: — /c — /c In 2 

wn) 



A simple analysis of the exponent shows that Pr [3 d-heavy node on level 1] = o(l /N) 
for k > 2e. 



Remark 9. The error term o(l /N) in Lemma 8 is chosen rather arbitrarily and could be 
changed to o(l/poly(A^)) without essential changes in the proofs. 

Lemma 8 immediately yields a rather weak upper bound on the total number of 
heavy nodes, which we will improve later. 

Corollary 10. With probability 1 — o(l/A^) it holds that = 0{N \ogN) (for 

c> o;. 
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Proof. Lemma 8 shows that with high probability only the first, say, Q-\n{WN /c) levels 
contain heavy nodes. On every level the number of (c/A^)-heavy nodes is at most WN jc 
since the weights of all nodes on the same level must have sum W. 

Since we want to show that with high probability ^ Corollary 10 is still 

far off from our desired result, but we already know that the possibly infinite runtime of 
Algorithm HF until all subproblems are (2FF/iV)-light is 0{Nlog N) with probability 
1 — o(l/A^) . In order to improve this result, we define a martingale and apply the method 
of bounded differences to show a sharp concentration around the mean. 

For the definition of the martingale we denote by := cr(o;i , . . . ,ai) the cr-algebra 

generated by the random variables ai (for i > 1) (see [Fel7 1 , p. 11 6f]). Furthermore we 
set Fq := { 0 , J7}. The sequence Fq C Fi C • • • C F forms a filter. As mentioned in 
[Fel71,p. 212f] 

defines a martingale, which is also called a Doob-martingale. It holds that Zq = E[iT“*] . 

Zf is a function from J? to K. The intuitive interpretation of Zf is as follows: Given a 
sequence s := {ai, . . . ,ai,a i+l,...) € J? of general bisection parameters, Zf{s) tells 
us how many heavy nodes we expect in the IBT if the generalized bisections parameters 
of the first i bisections correspond to Si := (ai, . . . , af. 

Let Ti denote the part of the IBT visited by Algorithm HF up to the f-th iteration 
(for example Tq contains only the root of the tree). When we know Si we can simulate 
Algorithm HF and compute Tj. Therefore, evaluating Zf corresponds to calculating the 
expected value of H'^, given the tree Tj which is generated by the first i bisections of 
Algorithm HF. In order to capture this intuition, we use the following notation: 

zf = I Fi] =: E[iT“* | Tf . 

In the following lemma we show that \Zf_^i — Zf\ is small. This a prerequisite for 
the application of the method of bounded differences. 

Lemma 11. For all i > 0 it holds that \Zf_^i ~ ^f \ 2. 

Proof. Throughout this proof we assume that s G is fixed, and show that \Zfj^^ (s) — 
Zf{s)\<2. 

Since we know the prefixes Si and of s, we can compute the trees Ti and T^+i. 
Let V denote the node that is bisected at the (i + l)-st iteration. 

If V is light (w (n) < d) the claim follows easily: All nodes not yet visited by Algorithm 
HF must also be light, because n is a heaviest leaf in the expanded part of the tree. Thus, 
we have already seen all heavy nodes and obtain Zfj^^ = Zf = 

Now we assume that w(n) > d (in the sequel we write w := w(n) for short). Let vi 
and V 2 be the two nodes generated by the bisection of v. In addition, let w(ni) = aw 
and vj{v 2 ) = (1 — a)w (without loss of generality 0.5 < a < 1). 

We set Ti = J U T U {n} and T^+i = JuTU{n}U {ui } U {v 2 }, where / denotes the 
set of interior nodes of fi and L denotes the set of leaves of fi except v.Tis the complete 
binary tree (as before) and T (u) is the subtree of T beginning at node u. Moreover, we 
introduce the abbreviation H [u\fi):= 'f2xeT{u) I for Ih® expected number 
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of heavy nodes in T{u) under the eondition that we already know the prefix Ti of the 
IBT. 

We observe that all nodes in / must be d-heavy. This yields 

Zf = I Ti] = ^ ¥.[Ht I Ti] = |/| + ^ h\u I Ti) + h\v \ Ti) 

u^T u^L 

Zf+i = \I\ + I Ti) + 1 + h\vi I Ti+i) + h\v2 I Ti+i) , 

uEL 

sinee \ Ti_|_i] = 1, beeause v is heavy. 

We bound \Zf — Zf_^^ I by eonsidering three eases and using estimates for H (r | Ti ) 
(for T G {u, Ui, U 2 }). Note that, if u is a d-heavy leaf in Ti, we ean use Theorem 7 and 
obtain \ Ti) = ^w{u) — 1. 

Case 1: aw < d In this case Vi and V 2 are light. Therefore, it holds that w < 2d and 
I Ti+i) = H‘^(v 2 I Ti+i) = 0. It follows that 

Zf-Zf^, = H\v\T,)-{l + H\vi\T,+,) + H\v2\T,+,)) = ^w-l-l<2 . 

Furthermore, we obtain due to w > d that Zf — Zf_^_^ > 0. 

The cases Case 2: (1 — a)w < d and aw > d and Case 3: (1 — a)w > d are 
similar. 

The next lemma is basic probability theory, but we prefer to state it separately in 
order to render the proof of the main theorem more readable. 

Lemma 12. Let A and B be two events over a probability spaee J7. IfTr\B] = 1 — 
o(l/iV) then 

Pr[yl] < Pr[A nB] + o{l/N) . 

Proof. Bound the terms in Pr[vl] = Pr[j4|i?] Pr[i?] + Pr[j4|i?] Pr[i?]. 

We will use the following theorem from the method of bounded differences (see 
[McD89]): 

Theorem 13. Let Zq, Zi, . . . be a martingale sequence such that \Zk — Zk- 1 1 < Cfe for 
each k, where Ck may depend on k. Then, for all t >0 and any A > 0 

Pv[\Zt - Zq\ > A] < 2exp ( J ^ 

V 2Efe=ic| 

Using this inequality we prove the next theorem: 

Theorem 14. With probability 1— 2e ^ o{l/N) it holds ford = c/N withe = 0{\) 

and k > 0 that 

\H<^ - E[id“*]| < ■ s/N . 
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Proof. First we note that if < t then Zf = because after t steps Algorithm 

HF has certainly bisected all heavy nodes. Hence, only light nodes remain and we 

know the exact value of since we have already seen all heavy nodes. Consequently, 
= E[H‘^ I Tf] = Zf in this case. 

We have shown in Lemma 11 that \Zf — Zf_^_^\ < 2 =: c^. Now we apply Theorem 13 
for A = withy = 0.5+e(e > 0).Fort‘' = C^(AllogAl) we obtain from Corollary 10 
and Lemma 12: 

Pr[|iT“* - E[iT“*]| > AH'] < Pr[|iT“* - E[iT“*]| > AH' A < t'] + o{l/N) 

= Pr[\Zf, - Z^\ >N^ A < t'] + o{l/N) 

< Pr[\Zf, - > AH'] + o{l/N) 

( A1^7 \ 

< 2 • exp p ^ + o(l/W) = o(l/W) . 

V 

Now we know that with high probability < t” := 2W/d—l+N'^ = {2W/c)N — 

1 + A17 . If we apply Theorem 1 3 one more time using this estimate for and performing 

similar calculations we get (for N large enough) 

Pr[|7T“* - E[7T“*]| > ky^{2W/c)N] < Pr[\Zf„ - Z^\ > ky^{2W/c)N] + o{l/N) 

< 2e-'='/® + o(l/W) . 

The results for the random variable immediately yield the desired consequen- 
ces for the performance of Algorithm HF. 

Theorem 3. If Algorithm HF starts with a problem of weight W and produees a set P 
of N subproblems after A" — 1 bisections then for e = 9\/\n{N)/N 

Pr (2-e)^<lP^ax<(2 + e)^ = 1 - o(l/A) , 

where FL^ax ■= naax{w(p) : p G P} denotes the weight of the heaviest subproblem 
in P. 

Proof. First we show that p+ := Pr [FFj^ax — (2 + e) = o{l/N). 

For c' := (2 + e)FL and /? := |^ it holds that p+ = Pr > P] = Pr[iT^ > 

N — 1], because Algorithm HF expands heavy subproblems before light subproblems. 
Due to Theorem 7 we have E[7T^] = — 1 = A — 1 and, thus, we can rewrite 

p+ as p+ = Pr \h>^ - E[H>^] > . 

Since (for A sufficiently large) 2 ^^ — ' \/lii(^) • N and 1 > 

\/2 / (2 + e) = ^j2Wjd , we obtain using Theorem 14 for /c = 3.6\/ln(A) that 

p+ < Pr - E[A^]| > ksj2W!c' ■ sPn] = o(l/A) . 

The proof forp^ := Pr [FL^ax < (2 — e)^] = o(l/A) is similar. 
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Simulation results complement Theorem 3 and show that also for small N Algo- 
rithm HF exhibits a good average-case behavior. Already for N « 100 the predicted 
value for from Theorem 3 and the ‘real’ value seen in simulations differ only by a 
relative error of less than one percent. The standard deviation is tiny, as one could expect 
due to Theorem 14. 

3.5 Simplified Parallel Version of Algorithm HF 

The proofs did not depend on the exact order in which Algorithm HF processes the nodes 
in the tree. We only made use of the fact that heavy nodes are processed before light 
nodes. Hence, it would suffice to run Algorithm HF until all subproblems are d-light 
with d = (2 + e) ^ and e = 9 \Jhi{N ) / N . In this case we know that the number of 
generated subproblems is at most N with high probability. This variant of Algorithm HF 
does not need priority queues and requires only constant time per iteration to find the 
node q which shall be bisected next. Furthermore, there is a natural parallel version of the 
modified Algorithm HF, because for each subproblem it can be decided independently 
if further bisections should be applied to it. The sharp concentration result shows that 
this modification makes no substantial difference for the quality of the load distribution. 

4 Conclusion 

In this paper we have analyzed the Heaviest First Algorithm for dynamic load balan- 
cing. From a practitioners point of view this algorithm is applicable to a wide range of 
problems, because it only depends on the fact that some bisection algorithm is available. 
Additionally, it is very easy to implement and can be efficiently parallelized. 

Our analysis focused on the average-case performance of Algorithm HF. Under 
rather pessimistic assumptions concerning the distribution of the bisection parameters we 
showed that the size of the maximum subproblem produced by Algorithm HF differs from 
the best attainable value only by a factor of roughly two. We believe that this provides 
a reasonable explanation for the good performance of Algorithm HF in simulations and 
practical applications. In order to improve the understanding of Algorithm HF further it 
would be interesting to transfer our results to more general distributions. 
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Abstract. There is a lot of experimental evidence that crossover is, for some fun- 
ctions, an essential operator of evolutionary algorithms. Nevertheless, it was an 
open problem to prove for some function that an evolutionary algorithm using 
crossover is essentially more efficient than evolutionary algorithms without cros- 
sover. In this paper, such an example is presented and its properties are proved. 



1 Introduction 

Stochastic search strategies have turned out to be efficient heuristic optimization techni- 
ques, in particular, if not much is known about the structure of the funetion and intense 
algorithmic investigations are not possible. The most popular among these algorithms are 
simulated annealing (van Laarhoven and Aarts [12]) and evolutionary algorithms, which 
come in great variety (evolutionary programming (Fogel, Owens, and Walsh [4]), genetic 
algorithms (Holland [7], Goldberg [6]), evolution strategies (Schwefel [22])). There is a 
lot of “experimental evidenee” that these algorithms perform well in eertain situations 
but there is a lack of theoretical analysis. It is still a central open problem whether, for 
some funetion, a simulated annealing algorithm with an appropriate eooling schedule is 
more efficient than the best Metropolis algorithm, i. e., a simulated annealing algorithm 
with fixed temperature (Jerrum and Sinclair [9]). Jerrum and Sorkin [10] perform an 
important step towards an answer of this question. Here, we solve a eentral open pro- 
blem of similar fiavor for genetic algorithms based on mutation, crossover, and fitness 
based selection. All these three modules are assumed to be essential but this has not 
been proved for crossover. Evolutionary algorithms without crossover are surprisingly 
efficient. Juels and Wattenberg [11] report that even hill climbing (where the population 
size equals 1) outperforms genetic algorithms on nontrivial test functions. For a function 
called “long path”, which was introduced by Horn, Goldberg, and Deb [8], Rudolph 
[20] has proved that a hill elimber performs at least comparable to genetic algorithms. 
Indeed, the following central problem considered by Mitchell, Holland, and Forrest [15] 
is open: 

• Define a family of functions and prove that genetic algorithms are essentially better 
than evolutionary algorithms without crossover. 

* This work was supported by the Deutsche Forschungsgemeinschaft (DFG) as part of the Colla- 
borative Research Center “Computational Intelligence” (531). 
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One cannot really doubt that such examples exist. Several possible examples have 
been proposed. Forrest and Mitchell [5] report for the well-known candidate called 
Royal Road function (Mitchell, Forrest, and Holland [14]) that some random mutation 
hill climber outperforms genetic algorithms. So the problem is still open (Mitchell and 
Forrest [13]). The problem is the difficulty to analyze the consequences of crossover, 
since crossover creates dependencies between the objects. Hence, the solution of the 
problem is a necessary step to understand the power of the different genetic operators 
and to build up a theory on evolutionary algorithms. 

There are some papers dealing with the effect of crossover. Baum, Boneh, and Garrett 
[1] use a very unusual crossover operator and a population of varying size which not 
really can be called genetic algorithm. Another approach is to try to understand crossover 
without fitness based selection. Rabinovich, Sinclair, and Wigderson [18] model such 
genetic algorithms as quadratical dynamic systems, and Rabani, Rabinovich, and Sinclair 
[17] investigate the isolated effects of crossover for populations. These are valuable 
fundamental studies. Here, we use a less general approach but we investigate a typical 
genetic algorithm based on mutation, uniform crossover, and fitness based selection. 
The algorithm is formally presented in Section 2. In Section 3, we prove, for the chosen 
functions, that algorithms without crossover necessarily are slow and, in Section 4, we 
prove that our genetic algorithm is much faster. 

Definition 1. The function JUMP^.n : {0, 1}” — ^ IR is defined by 



JUMP jri^n (^1 ) • • • 5 ^n) 



f m + ||a;||i (f ||a;||i < n — m or ||a:||i = n 
(n— ||a;||i otherwise 



where ||a;||i = xi + • • • + a:„ denotes the number of ones in x. 

The value of JUMP^, (fhe index n is usually omitfed) grows linearly wifh the number 
of ones in the input but there is a gap between the levels n — m and n. We try to maximize 
JUMP^. Then, inputs in the gap are the worst ones. We expect that we have to create 
the optimal input (1,1,..., 1) from inputs with n — m ones. This “jump” is difficult for 
mutations but crossover can help. More precisely, we prove that time J? (n™ ) is necessary 
without crossover while a genetic algorithm can optimize JUMP^ with large probability 
in time 0(n^ log n + log n) and the same bound holds for the expected time. The 
gap is polynomial for constant m and even superpolynomial for m = ©(log n). 



2 Evolutionary Algorithms 

We discuss the main operators of evolutionary algorithms working on the state space 
S = {0, 1}" where we maximize a fitness function f : S — ^ M. We use the operators 
initialization, mutation, crossover, and selection. 

X := initialize (S', s). Choose randomly and independently s objects from S to form 
the population X. 

{y, b) := mutate(X,p). Choose randomly an object x <E X and, independently for 
all positions i G { 1, . . . , n}, set := 1 — Xi with probability p and pi := Xi otherwise. 
Set & := 0, if a; = y, and 6 := 1 otherwise. 
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The most common choice is p = 1/n ensuring that, on average, one bit of x is 
flipped. The optimality of this choice has been proved for linear functions by Droste, 
Jansen, and Wegener [3]. For some evolutionary algorithms it is not unusual to abstain 
from crossover. Evolutionary programming (Fogel, Owen, and Walsh [4]) and evolution 
strategies (Schwefel [22]) are examples. The so-called (/r + A)-evolution strategy works 
with a population of size /x. Then A children are created independently by mutation and 
the best /x objects among the parents and children are chosen as next population. Ties 
are broken arbitrarily. 

Genetic algorithms typically use crossover. For the function at hand, uniform cross- 
over is appropriate. 

(y, 6) := uniform-crossover-and-mutate(Af,p). Choose randomly and indepen- 
dently x' , x" e X and, independently for all positions i G {1, . . . , n}, set 2 :^ := x'^ 
with probability 1/2 and Zi := x" otherwise. Then y := mutate({ 2 :},p). Set b := 0, if 
y G {x' , x"}, and b := 1 otherwise. 

Ronald [19] suggests to avoid duplicates in the population in order to prevent po- 
pulations with many indistinguishable objects. We adopt this idea and only prevent 
replications (see below). Moreover, we use a variant of genetic algorithms known as 
steady state (Sarma and De Jong [21]). This simplifies the analysis, since, in one step, 
only one new object is created. Now we are able to describe our algorithm. 

Algorithm 1. 

1. X := initialize({0, 1}", n). 

2. Let r be a random number from [0, 1] (uniform distribution). 

3. If r < l/(nlogn), 

(y, 6) := uniform-crossover-and-mutate(X, 1/n). 

4. If r > l/(nlogn), 

(y, 6) := mutate(X, 1/n). 

5. Choose randomly one of fhe objects x £ X with smallest /-value. 

6. If6 = 1 and/(y) > f{x), 

X:=[X- {x}) U {y}. 

7. Return to Step 2. 

Steps 5 and 6 are called steady state selection preventing replications. We do not 
care about the choice of an appropriate stopping rule by using as most of the authors 
the following complexity measure. We count the number of evaluations of / on created 
objects until an optimal object is created. In the following we discuss in detail the 
evolution strategies described above and the genetic algorithm described in Algorithm 1. 
We are able to obtain similar results for the following variants of the algorithm (details 
can be found in the full version). 

1. Evolutionary algorithms without crossover may use subpopulations which work 
independently for some time and may exchange information sometimes. 

2. Evolutionary algorithms without crossover as well as the genetic algorithm may 
choose objects based on their fitness (for mutation, crossover, and/or selection) as 
long as objects with higher fitness get a better chance to be chosen and objects with 
the same fitness get the same chance to be chosen. 

3. The genetic algorithm may refuse to include any duplicate into the population. 
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4. The genetic algorithm may accept replications as well as duplicates. In this case, 
all our results qualitatively still hold. The actual size of the upper bounds for the 
genetic algorithm changes in this case, though. At the end of Section 4 we discuss 
this in more detail. 

5. The genetic algorithm may replace the chosen object by y even if /(y) < f{x). 

3 Evolutionary Algorithms without Crossover on JUMP.rn 

Evolutionary algorithms without crossover create new objects by mutations only. If x 
contains i zeros, the probability of creating the optimal object equals y*(l — Let 

m < (| — e)n. It follows by Chernoff’s inequality that, for populations of polynomial 
size, the probability to have an object x, where ||a;||i > n — m, in the first population is 
exponentially small. Then, the expected time to reach the optimum is bounded below by 
tm,n = min{y^*(l — | m < i < n}. This holds since we do not select objects x 

wheren — m < ||a;||i < n. It is obvious that = 0(n™),ify = 1 /n. The following 
result follows by easy calculations. 

Proposition 2. Let m < (| — e)n for some constant e > 0. Evolutionary algorithms 
without crossover need expected time L2{ri^) to optimize JUMP^ if mutations flip bits 
with probability 1/n. For each mutation probability p, the expected time is 
for each constant c > 0. 

Droste, Jansen, and Wegener [2] have proved that, for population size 1 and p = 1/n, 
the expected time of an evolutionary algorithm on JUMP^, m > 1 , equals 0 (n™). 



4 The Genetic Algorithm as Optimizer of JUMP^ 

The main result of this paper is the following theorem. 

Theorem 3. Let mbea constant. For each constant A: € IN the genetic algorithm creates 
an optimal object for JUMP^ within 0{rf logn) steps with probability 1 — J?(n“^). 
With probability 1 — the genetic algorithm creates an optimal object for JUMPm 

within 0{'n?) steps. 

Proof. Our proof strategy is the following. We consider different phases ofthe algorithm 
and “expecf ’ in each phase a certain behavior. If a phase does not fulfill our expectation, 
we estimate the probability of such a “failure” and may start the next phase under the 
assumption that no failure has occurred. We also assume to have not found an optimal 
object, since otherwise we are done. Finally, the failure probability can be estimated 
by the sum of the individual failure probabilities. The constants Ci, C 2 , C 3 > 0 will be 
chosen appropriately. 

Phase 0: Initialization. We expect to obtain only objects x where ||a;||i < n — m or 

||a;||i = n. 

Phase 1 : This phase has length cin^ log n, if we want 0{n^^) as upper bound for the 
probability of a failure. Setting its length to Cin^, we can guarantee an error bound of 
even expect to create an optimal object or to finish with n objects with n — m 

ones. 
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Remark: If Phase 1 is successful, the definition of the genetic algorithm ensures that the 
property is maintained forever. 

Phase 2: This phase has length C 2 U^ log n. We expect to create an optimal object or to 
finish with objects with n — m ones where the zeros are not too concentrated. More 
precisely, for each bit position i, there are at most of the n objects with a zero at 
position i. 

Phase 3: This phase has length c^ri^ log n. We expect that, as long as no optimal object 
is created, the n objects contain at each position i G {1, . . . , n} altogether at most 
zeros. Moreover, we expect to create an optimal object. 

Analysis of Phase 1. We apply results on the coupon collector’s problem (see Mot- 
wani and Raghavan [16]). There are N empty buckets and balls are thrown randomly and 
independently into the buckets. Then the probability that, after (/? + 1) In N throws, 
there is still an empty bucket is . This result remains true if, between the throws, 
we may rename the buckets. 

We consider the bit positions of the n objects as buckets, so we have N = 'n?. 
Buckets corresponding to zeros are called empty. The genetic algorithm never increases 
the number of zeros. Hence, we slow down the process by ignoring the effect of new 
objects created by crossover or by a mutation Hipping more than one bit. We further slow 
down the process by changing the fitness to ||a;||i and waiting for ones at all positions. If 
a mutation Hips a single bit from 0 to 1, we obtain a better object which is chosen. The 
number of empty buckets decreases at least by 1 . If a single bit Hips from 1 to 0, we ignore 
possible positive effects (perhaps we replace a much worse object). Hence, by the result 
on the coupon collector’s problem, the failure probability is bounded by 
after {k + 1)N \nN = 2{k + l)n^ In n good steps and bounded by after 

((\/]V / In A") + 1)N In + 2n^ In n good steps. 

A step is not good if we choose crossover (probability l/(n log n)) or we flip not 
exactly one bit. The probability of the last event equals 1 — n • ^ • (1 — and is 

bounded by a constant a < 1. Hence, by Chemoff’s bound, we can bound the probability 
of having enough good steps among cin^ logn (or Cin^) steps by 1 — \ if c\ is 

large enough. 

Analysis of Phase 2. We have n objects with m zeros each. We cannot prove that 
the mn zeros are somehow nicely distributed among the positions. Good objects tend 
to create similar good objects, at least in the first phase. The population may be “quite 
concentrated” at the end of the first phase. Then, crossover cannot help. We prove that 
mutations ensure in the second phase that the zeros become “somehow distributed”. 

We only investigate the first position and later multiply the failure probability by n to 
obtain a common result for all positions. Let z be the number of zeros in the first position 
of the objects. Then z < n 'm the beginning and we claim that, with high probability, 
2 at the end. We look for an upper bound p^^{z) on the probability to increase 

the number of zeros in one round and for a lower bound p^(z) on the probability to 
decrease the number of zeros in one round. The number of zeros at a fixed position can 
change at most by 1 in one round. As long as we do not create an optimal object, all 
objects contain n — m ones. 
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Let resp. A be the event that (given z) the number of zeros (at position 1) 
inereases resp. decreases in one round. Then 



A+cBuBnCn 



DnBn 



u 



F+ 



U 



Ki<m—1 



DDE n IJ 

Ki<m 



G+ 



with the following meaning of the events: 

- i?: crossover is chosen as operator, Prob(i?) = l/(nlogn). 

- C: an object with a one at position 1 is chosen for replacement, ProbjC) = (n — 

z)/n. 

- D\ an object with a zero at position 1 is chosen for mutation, Prob(I9) = zjn. 

- E: the bit at position 1 does not flip, Prob(iil) = 1 — 1/n. 

- : there are exactly i positions among the (m — 1) 0-positions J 7^ 1 which flip 
and exactly i positions among the (n — m) 1-positions which flip, Prob(iL^) = 

We can exclude the case i = 0 which leads 

to a replication. 

- : there are exactly i positions among the m 0-positions which flip and exactly 
i — 1 positions among the (n — m — 1) 1-positions j ^ 1 which flip, Prob(G^ ) = 

Hence, 




Similarly, we get 



A~ D Bn Cn 



DnEn IJ 



u 



DnEn IJ 



Ki<m 



g; 



Ki<r 



where 

- Ef\ there are exactly i — 1 positions among the (m — 1) 0-positions j 1 which 
flip and exactly i positions among the (n — m) 1-positions which flip, Prob(iL^ ) = 
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- : there are exactly i positions among the m 0-positions whieh flip and exaetly i 

positions among the (n — m — 1) 1-positions j ^ 1 whieh flip (if i = 0, we get the 
case of a replieation), Prob(Gj^) = (™) 

Hence, 




Since m is a constant, we obtain, if 2; ^sk’^’thatp {z)>p (2 ) -p+( 2) = J7(i) 
and it is also easy to see that p^{z) = O(-). 

We call a step essential, if the number of zeros in the first position changes. The 
length of Phase 2 is C2U^ log n. The following considerations work under the assumption 
z > -^n. The probability that a step is essential is Hence, for come 4 > 0, 

the probability of having less than c^n log n essential steps, is bounded by Chemoff’s 
boundbye^"*^^"^. We assume that this failure does not occur. Let q'^{z) resp. q^{z) be the 
conditional probability of increasing resp. decreasing the number of zeros in essential 
steps. Then g+(2) = p+(2)/(p+(2) +p~{z)), q~{z) = p~ {z) / {p"^ {z) +p~{z)), 
and q^{z)—q^{z) = {p^ (z) — p'^ (z)) / {p^ (z) + p^ (z)) = J7(l). Hence, for some 
c '2 > 0, the probability of decreasing the number of zeros by less than c '2 n is bounded by 
Chemoff’s bound by We obtain c '2 = 1 by choosing C2 large enough. But this 

implies that we have at some point of time less than z* = zeros at position 1 and 
our estimations on p+(2) and p~{z) do not hold. We investigate the last point of time 
with z* zeros at position 1. Then there are t essential steps left. If f it is sure 

that we stop with at most -^n zeros at position 1 . If f > g^n, we can apply Chemoff’s 
bound and obtain a failure probability of Altogether the failure probability is 

^g-J7(n) _ g-j7(n)^ 

Analysis of Phase 3. First, we investigate the probability that the number of zeros 
at position 1 reaches ^n. For this purpose, we consider subphases starting at points 
of time where the number of zeros equals A subphase where the number of zeros 
is less than cannot cause a failure. The same holds for subphases whose length is 
bounded by In all other cases we can apply Chemoff’s bound and the assumption 
2: > Hence, the failure probability for each subphase is bounded by and 

the same holds for all subphases and positions altogether. We create an optimal object 
if we perform crossover (probability if the following mutation does not flip any 

bit (probability (1 — ^)"'), if the chosen bit strings do not share a zero at some position 
(probability at least see below) and the crossover chooses at each of the 2m positions, 
where the objects differ, the object with the one at this position (probability (g)^™). We 
prove the open claim. We flx one object of the population. The m zeros are w. 1. o. g. 
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at the positions 1, . . . , m. There are at most objects with a zero at some fixed 
position j G {1, . . . , n}, altogether at most colliding objects. Hence, the probability 
of choosing a second object without collision with the first one is at least Henee, 
the sueeess probability is at least ^ and the failure probability for 

log n steps is bounded by . 

Combining all estimations we have proved the theorem. □ 



Corollary 4. Let mbea constant. The expected time of the genetic algorithm on JUMPm 
is bounded by 0{rf log n). 

Proof. We remark that Theorem 1 ean be proved for arbitrary starting populations instead 
of random ones. The analysis of Phase 1 can be easily adjusted to populations that may 
contain bit strings with at least one but less than m zeros. We note, that the number of 
such bit strings can never be increased. It follows, that the probability of a failure in 
Phase 1 is still bounded by 0{n^^), if we ehoose the length of this phase as c\rf log n. 
Therefore, the sum of all failure probabilities ean be bounded by 0(l/n). In case that 
after one superround, i. e. the four phases, the optimum is not found, the process is 
repeated starting with the eurrent population at the end of Phase 4. The expected number 
of repetitions is (1 — 0(l/n))^^ = 0(1), implying that the expected running time is 
bounded by O (n^ log n) . □ 

In the following, we generalize our results to the ease m = O(logn) where we 
reduce the crossover probability to ^ ^ . Nothing has to be changed for Phase 1 (and 

Phase 0). In Phase 2, we obtain (z) — p+( 2 ) = l^( „iog 2 „ ), P^{z) = 0(^^|^),and 
q^{z) — q^{z) = > g^n. Then 0(n^ log® n) steps are enough to obtain 

the desired properties with a probability bounded by ^ for eaeh d < 1. The same 

arguments work for the first property of Phase 3 as long as the length is polynomially 
bounded. In order to have an exponentially small failure probability for the event to create 
an optimal object, we increase the number of steps to 6>(n^2^™). Ifwe are satisfied with a 
constant success probability, 0(n(log® n)2^™) steps are sufficient. We summarize these 
considerations. 

Theorem 5. Let m = 0(log n). For each constant d < 1, with probability 1 — 

the genetic algorithm creates an optimal object for JUMPm within O (n® + (log® n + 

2^™)) steps. The expected run time is bounded by 

0{n log® n(n log^ n + 2^™)) . 

If we allow replications as well as duplicates things change a little. We concentrate 
on the ease where m is a constant. Nothing changes for phase 0. Our analysis of Phase 1 
remains valid, too. We remark that replications occur with probability at least (1 — 
rtiogrt )(^ ^ tendency of good objects to create similar or equal objeets 

in the first phase is enlarged. We change the length of Phase 2 to a(n) • C 2 n® log n and 
discuss the role of a (n) later. We have to adapt our eonsiderations to the eireumstance that 
replications are allowed. For we have to include the event Fq^, for we inelude Gq . 
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We still have p {z) > p (z) - p+{z) = n{:^) = n{^), hut p (z) = 0{^) does 
not hold, now. We consider only essential steps that still occur with probability 
Let again q^{z) = p^{z)/{p^{z) +p^^{z)) resp. q^{z) = p'^ (z) / {p^ (z) +p+( 2 :)) be 
the conditional probabilities for decreasing resp. increasing the number of zeros in one 
essential step. Now, we have q~{z) — q^{z) = If in exactly d of a(n) • logn 

essential steps the number of zeros is decreased, we end up with z+a{n)- log n — 2d 

zeros. Applying Chemoff’s inequality yields that the probability not to decrease the 
number of zeros to at most -^n in a(n) • log n essential steps is logn)^ 

Choosing a(n) = yields that with probability 1 — the number of zeros 

is at most at all positions after Phase 2, which now has a length of C 2 n^ steps. With 
a(n) = log n, Phase 2 needs only C 2 n^ log^ n steps, but the probability of a failure is 
increased to which is still subpolynomial. 

In Phase 3 replications change the probabilities for changing the number of zeros 
in the same way as in Phase 2. Therefore, we can adapt our proof to the modified 
algorithm the same way as we did for Phase 2. Since we are satisfied, if the number of 
zeros is not increasing too much, compared to Phase 2, where we need a decreasement, 
it is not necessary to adjust the length of Phase 3. In order to have the bounds for the 
probability of a failure in Phase 1 and Phase 2 in the same order of magnitude, we choose 
Cia(n)n^ logn as length of Phase 1. We conclude that a genetic algorithm that allows 
replications finds a optimum of JUMP^, for constant m, in 0{a{n)'n? log n) steps with 
probability 1 — We remark that the analysis of this variant of a genetic 

algorithm can be adapted to m = 0(log n), too. 

5 Conclusion 

Evolutionary and genetic algorithms are often used in applications but the theory on 
these algorithms is in its infancy. In order to obtain a theory on evolutionary and genetic 
algorithms, one has to understand the main operators. This paper contains the first proof 
that, for some function, genetic algorithms with crossover can be much more efficient 
(polynomial versus superpolynomial) than all types of evolutionary algorithms without 
crossover. The specific bounds are less important than the fact that we have analytical 
tools to prove such a result. The difference in the behavior of the algorithms can be 
recognized in experiments already for small parameters, e. g. n = 50 and m = 3. 
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Abstract. We present a complete analysis of the statistics of number of occurren- 
ces of a regular expression pattern in a random text. This covers “motifs” widely 
used in computational biology. Our approach is based on: (i) classical constructive 
results in theoretical computer science (automata and formal language theory); (ii) 
analytic combinatorics to compute asymptotic properties from generating func- 
tions; (Hi) computer algebra to determine generating functions explicitly, analyse 
generating functions and extract coefficients efficiently. We provide constructions 
for overlapping or non-overlapping matches of a regular expression. A companion 
implementation produces: multivariate generating functions for the statistics un- 
der study; a fast computation of their Taylor coefficients which yields exact values 
of the moments with typical application to random texts of size 30,000; precise 
asymptotic formulae that allow predictions in texts of arbitrarily large sizes. Our 
implementation was tested by comparing predictions of the number of occurren- 
ces of motifs against the 7 megabytes aminoacid database Prodom. We handled 
more than 88% of the standard collection of Prosite motifs with our programs. 
Such comparisons help detect which motifs are observed in real biological data 
more or less frequently than theoretically predicted. 



1 Introduction 

The purpose of molecular biology is to establish relations between chemical form and 
function in living organisms. From an abstract mathematical or computational stand- 
point, this gives rise to two different types of problems: processing problems that, bro- 
adly speaking, belong to the realm of pattern-matching algorithmics, and probabilistic 
problems aimed at distinguishing between what is statistically significant and what is 
not, at discerning “signal” from “noise”. The present work belongs to the category of pro- 
babilistic studies originally motivated by molecular biology. As we shall see, however, 
the results are of a somewhat wider scope. 

Fix a finite alphabet, and take a large random text (a sequence of letters from the 
alphabet), where randomness is defined by either a Bernoulli model (letters are drawn 
independently) or a Markov model. Here, a pattern is specified by an unrestricted regular 
expression R and occurrences anywhere in a text file are considered. The problem is 
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to quantify precisely what to expect about the number of occurrences of pattern i? in a 
random text of size n. We are interested first of all in moments of the distributions — what 
is the mean and the variance? — but also in asymptotic properties of the distribution — 
does the distribution have a simple asymptotic form? — as well as in computational 
aspects — are the characteristics of the distribution effectively accessible? 

We provide positive answers to these three questions. Namely, for all “non-degene- 
rate” pattern specifications* R, we establish the following results: (i). The number of 
occurrences has a mean of the form /x • n + O ( 1 ) , with a standard deviation that is of order 
^/n^, in particular, concentration of distribution holds; (ii). The number of occurrences, 
once normalized by the mean and standard deviation, obeys in the asymptotic limit a 
Gaussian law; (in). The characteristics of the distribution are effectively computable, 
both exactly and asymptotically, given basic computer algebra routines. The resulting 
procedures are capable of treating fairly large “real-life” patterns in a reasonable amount 
of time. 

Though initially motivated by computational biology considerations, these results 
are recognizably of a general nature. They should thus prove to be of use in other areas, 
most notably, the analysis of complex string matching algorithms, large finite state 
models of computer science and combinatorics, or natural language studies. (We do not 
however pursue these threads here and stay with the original motivation provided by 
computational biology.) 

The basic mathematical objects around which the paper is built are counting ge- 
nerating functions . In its bivariate version, such a generating function encodes exactly 
all the information relative to the frequency of occurrence of a pattern in random texts 
of all sizes. We appeal to a combination of classical results from the theory of regular 
expressions and languages and from basic combinatorial analysis (marking by auxiliary 
variables) in order to determine such generating functions systematically. Specifically, 
we use a chain from regular expression patterns to bivariate generating functions that 
goes through nondeterministic and deterministic finite automata. Not too unexpectedly, 
the generating functions turn out to be rational (Th. 1 ), but also computable at a reason- 
able cost for most patterns of interest (§6). Since coefficients of univariate rational GF’s 
are computable in O(logn) arithmetic operations, this provides the exact statistics of 
matches in texts of several thousands positions in a few seconds, typically. Also, asym- 
ptotic analysis of the coefficients of rational functions can be performed efficiently [13]. 
Regarding multivariate asymptotics, a perturbation method from analytic combinatorics 
then yields the Gaussian law (Th. 2). 

In the combinatorial world, the literature on pattern statistics is vast. It originates lar- 
gely with the introduction of correlation polynomials by Guibas and Odlyzko [14] in the 
case of patterns defined by one word. The case of several words in Bernoulli or Markov 
texts was studied by many authors, including [14,10,4,24,25]; see also the review in [31, 
Chap. 12]. As a result of these works, the number of occurrences of any finite set of pat- 
terns in a random Bernoulli or Markov text is known to be asymptotically normal. Several 
other works are motivated by computational biology considerations [20,28,22,26,29,1]. 



* Technically, non-degeneracy is expressed by the “primitivity” condition of Th. 2. All cases of 
interest can be reduced to this case; see the discussion at the end of §4. 
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Our distributional results that deal with arbitrary regular expression patterns, including 
infinite word sets, extend the works of these authors. 

The effective character of our results is confirmed by a complete implementation 
based on symbolic computation, the Maple system in our case. Our implementation has 
heen tested against real-life data provided by a collection of patterns, the frequently used 
Prosite collection^ [2]. We apply our results to compute the statistics of matches and 
compare with what is observed in the Prodom database^. 

In its most basic version, string-matching considers one or a few strings that are 
searched for in the text. Motifs appear in molecular biology as signatures for families of 
similar sequences and they characterize structural functionalities of sequences derived 
from a common ancestor. For instance, a typical motif of Prosite is [LIVM](2)-x-D- 
D-x(2,4)-D-x(4)-R-R-[GH], where the capital letters represent aminoacids, ‘x’ stands 
for any letter, brackets denote a choice and parentheses a repetition. Thus x(2,4) means 
two to four consecutive arbitrary aminoacids, while [LIVM](2) means two consecutive 
elements of the set {L,I, V,M} . Put otherwise, a motif is a regular expression of a restricted 
form that may be expanded, in principle at least, into a finite set of words. Our analysis 
that addresses general regular expression patterns, including a wide class of infinite sets 
of words, encompasses the class of all motives. 

On the practical side, it is worthwhile to remark that the automaton description for a 
motif tends to he much more compact than what would result from the expansion of the 
language described by the motif, allowing for an exponential reduction of size in many 
cases. For instance, for motif PS00844 from Prosite our program builds an automaton 
which has 946 states while the number of words of the finite language generated by 
the motif is about 2 x 10^®. In addition, regular expressions are able to capture long 
range dependencies, so that their domain of application goes far beyond that of standard 
motifs. 

Contributions of the paper. This work started when we realized that computational bio- 
logy was commonly restricting attention to what seemed to be an unnecessarily constrai- 
ned class of patterns. Furthermore, even on this restricted class, the existing literature 
often had to rely on approximate probabilistic models. This led to the present work that 
demonstrates, both theoretically and practically, that a more general framework is fully 
workable. On the theory side, we view Th. 2 as our main result, since it appears to gene- 
ralize virtually everything that is known regarding probabilities of pattern occurrences. 
On the practical side, the fact that we can handle in an exact way close to 90% of the 
motifs of a standard collection that is of common use in biological applications probably 
constitutes the most striking contribution of the paper. 



2 Main Results 

We consider the number of occurrences of a pattern (represented by a fixed given regular 
expression R) in a text under two different situations: in the overlapping case, all the 

^ At the moment, Prosite comprises some 1 ,200 different patterns, called “motifs”, that are regular 
expressions of a restricted form and varying stmctural complexity. 

^ Prodom is a compilation of “homologous” domains of proteins in Swiss-Prot, and we use it 
as a sequence of length 6,700,000 over the alphabet of aminoacids that has cardinality 20. 
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positions in the text where a match with the regular expression can occur are counted 
(once); in the non-overlapping case, the text is scanned from left to right, and every 
time a match is found, the count is incremented and the search starts afresh at this 
position. These cases give rise to two different statistics for the number of matches 
in a random text of size n, and we handle both of them. Without loss of generality, we 
assume throughout that R does not contain the empty word e. 

In each context, the method we describe gives an algorithm to compute the bi- 
variate probability generating function P{z,u) = Yhn k>oPn,kU^ z'^ , with = 
Pr{X„ = k]. This generating function specializes in various ways. First, P{z, 0) 
is the probability generating function of texts that do not match against the motif, 
while R{z) = 1/(1 — z) — P(z,0) is the probability generating function of texts 
with at least one occurrence. More generally, the coefficient [u^]P[z,u) is the gene- 
rating function of texts with k occurrences. Partial derivatives Mi(z) = 1) and 

M 2 {z) = -§^u^{z, u) \^_y, are generating functions of the first and second moments 
of the number of occurrences in a random text of length n. 

Our first result characterizes these generating functions as effectively computable 
rational functions. 

Theorem 1. Let R be a regular expression, Xn the number of occurrences of R in a 
random text of size n, and Pn,k = = h} the corresponding probability distri- 

bution. Then, in the overlapping or in the non-overlapping case, and under either the 
Bernoulli model or the Markov model, the generating functions P{z,u), R{z), Mi{z), 
M 2 {z), corresponding to probabilities of number of occurrences, existence of a match, 
and first and second moment of number of occurrences, are rational and can be computed 
explicitly given R. 

Our second result provides the corresponding asymptotics. Its statement relies on the 
fundamental matrix T{u) defined in §4, as well as the notion of primitivity, a technical 
but nonrestrictive condition, that is defined there. 

Theorem 2. Under the conditions ofTh. 1, assume that the 'fundamental matrix ” T(1 ) 
defined by (3) is primitive. Then, the mean and variance of X^ grow linearly, E(X„) = 
/xn + Cl + Var(X„) = a^n + C 2 + 0{A^), where p f 0, a f 0, ci, C 2 are 

computable constants. 

The normalized variable, (X„ — p,n)|{a^/n), converges with speed 0{1/ ^/n) to a 
Gaussian law. A local limit and large deviation bounds also hold. 



3 Algorithmic Chain 

In order to compute the probability generating function of the number of occurrences of a 
regular expression, we use classical constructions on non-deterministic and deterministic 
finite automata. For completeness, we state all the algorithms, old and new, leading to the 
probability generating functions of Th. 1. References for this section are [19,17,15,23] 
among numerous textbooks describing regular languages and automata. 

Regular Languages. We consider = {fi, . . . , A wore/ over is a 

finite sequence of letters, that is, elements of 27. A language over 27 is a set of words. The 
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product A = Ai- A 2 of two languages Ai and ^2 is ^ = {wiW 2 ,wi e Ai,W 2 & A 2 }, 
where W 1 W 2 is the concatenation of words wi and W 2 - Let be the set of products of 
n words belonging to A, then the star closure A* of a language A is the infinite union 
A* = The language U* is thus the collection of all possible words over U. 

Regular languages over S are defined inductively. Such a language is either the 
empty word, or it reduces to a single letter, or it is obtained by union, product or star 
closure of simpler regular languages. The formula expressing a regular language in terms 
of these operations and letters is called a regular expression. As notational convenience, 
i denotes the singleton language {f }, + represents a union, and • is freely omitted. The 
order of precedence for the operators is ★, •, +. 

A Nondeterministic Finite Automata (or NFA) is formally specified by five elements. 
(1) An input alphabet A; (2) A finite collection of states Q; (3) A start state s € if; 
(4) A collection of final states F C Q; (5) A (possibly partial) transition function 6 from 
Q X F to Sq the set of subsets of Q. There exists a transition from state qi to state qj 
if there is a letter £ £ F such that qj <E 5{qi,£). A word w = W\W 2 ■ ■ ■ Wn € F* is 
aeeep ted orreeognized hy alAF A A = {F,Q,s,F,S) ifthere exists a sequence of states 
qo,qi,q 2 ,---,qn such that qo = s, qj € S{qj^i,Wj) and € F. 

Kleene’s theorem states that a language is regular if and only if it is recognized 
by a NFA. Several algorithms are known to construct such a NFA. We present below 
an algorithm due to [6] as improved by [8] that constructs a NFA called the Glushkov 
automaton. 

Algorithm 

1 . [Berry & Sethi] Input: a regular expression R over an alphabet F . Output: a NFA recognizing 
the eorresponding language. 

1 Give increasing indices to the occurrences of each letter of F oceurring in R. Let F' be the 
alphabet consisting of these indexed letters. 

2 For each letter £ 6 F' , eonstruct the subset follow(f) of F' of letters that can follow f in a 
word recognized by R. 

3 Compute the sets first(i?) and last(i?) of letters of F' that ean oecur at the beginning and at 
the end of a word recognized by R. 

4 The automaton has as states the elements of F' plus a start state. The transitions are obtained 
using follow and erasing the indices. The final states are the elements of last(i?) . 

Steps 2 and 3 are performed by computing inductively four functions “first”, “last”, 
“follow” and “nullable”. Given a regular expression r over F' , first returns the set of 
letters that can occur at the beginning of a match; last returns those that can occur at the 
end of a match; nullable returns true if r recognizes the empty word and false otherwise; 
for each £ £ F' that occurs in r, follow returns the set of letters that can follow f in a 
word recognized by r. The computation of these functions is a simple induction [8]; the 
whole algorithm has a quadratic complexity. 

Deterministic Finite Antomata (or DFAs) are special cases of NFAs where the images 
of the transition function are singletons. By a classical theorem of Rabin & Scott, NFAs 
are equivalent to DFAs in the sense that they recognize the same class of languages. This 
is made effective by the powerset construction. 

Algorithm 

2. [Rabin & Scott] Input: a NFA A = {F,Q, s, F,S)- Output: a DFA recognizing the same 
language. 
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1 Define a transition function A : Sq x }J ^ Sq by: Vy 6 Sq, \fi 6 1/’, A{V,£) = 
Uqev3{q, £), where Sq is the set of subsets of Q. 

2 Define Qf as the set of subsets of Q that contain at least one element of F. 

3 Return the automaton {S, Sq, {s}, Qf, A). 

The number of states of the DFA constructed in this way is not necessarily minimal. 
In the worst case, the construction is of exponential complexity in the number of states 
of the NFA. For applications to motifs however, this construction is done in reasonable 
time in most cases (see §6). 

Generating Functions. Let .A be a language over U. The generating function of the 
language is obtained by summing formally all the words of A and collecting the re- 
sulting monomials with the letters being allowed to commute. The generating function 
of the language A is then defined as the formal sum A{£i, . . . ,1^) = com(ru), 

with com(ru) = wiW 2 ■ ■ - Wn the monomial associated to w = W\W 2 ■ ■ - Wn G A, 
and com(e) = 1. We use the classical notation [£f ■ ■ ■ to denote the coefficient 
of ■ ■ ■ if in the generating function A. (There is a slight abuse of notation in using 
the same symbols for the alphabet and the variables, which makes notation simpler.) 
Algorithm 

3. [Chomsky & Schiitzenberger] Input: A regular expression. Output: Its generating function. 

1 Construct the DFA recognizing the language. For each state q, let £, be the language of 

words recognized by the automaton with q as start state. These languages are connected by 
linear relations, Cq = (e+) where e is present when g is a final state. The 

automaton being deterministic, the unions in this system are disjoint. 

2 Translate this system into a system of equations for the associated generating functions: 

Lq = ( 1 +) ■ 

3 Solve the system and get the generating function F = Lg, where s is the start state. 

The resulting generating is rational, as it is the solution of a linear system [9]. Natu- 
rally, the algorithm specializes in various ways when numerical weights (probabilities) 
are assigned to letters of the alphabet. 

Regular Expression Matches. We first consider the Bernoulli model. The letters of 
the text are drawn independently at random, each letter fj of the alphabet having a 
fixed probability pi, and ffpi = 1. The basis of the proof of Th. 1 is the following 
construction. 

Algorithm 

4. [Marked automaton] Input: A regular expression R over the alphabet S. Output: A DFA 
recognizing the (regular) language of words over F U {m} where each match of the regular 
expression R is followed by the letter m f S, which occurs only there. 

1 Construct a DFA A = {Q,s,F,S,S) recognizing F* R. 

2 Initialize the resulting automaton: set A' = {Q' , s,Q, F + m. S') with initial values S' = S 
and Q' = Q. 

3 Mark the matches of R: for all g G Q and all f G LI such that S{q,£) = f E F, create a new 
state gr in Q', set d'(g,f) := g^ and d'(gr, m) := /. 

4 Restart after match (non-overlap case only): for all f E F, and &\\ £ E F set S'{f,£) := 
S{s,£). 

5 Return A . 

We note that the automaton constructed in this way is deterministic since all the 
transitions that have been added are either copies of transitions in A, or start from a new 
state, or were missing. 
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This automaton recognizes the desired language. Indeed, the words of U*R are all 
the words of U* ending with a match of R. Thus the final states of A are reached only 
at the end of a match of R. Conversely, since no letter is read in advance, every time a 
match of R has just been read by A, the state which has been reached is a final state. Thus 
inserting a non-final state and a marked transition “before” each final state corresponds 
to reading words with the mark m at each position where a match of R ends. Then by 
making all the states final except those intermediate ones, we allow the words to end 
without it being the end of a match of R. In the non-overlapping case, the automaton 
is modified in step 4 to start afresh after each match. (This construction can produce 
states that are not reachable. While this does not affect the correctness of the rest of the 
computation, suppressing these states saves time.) 

The proof of Th. 1 is concluded by the following algorithm in the Bernoulli model. 
Algorithm 

5. [Number of matches — Bernoulli] Input: A regular expression R over an alphabet A’ and 
the probabilities pi of occurrence of each letter A 6 A. Output: The bivariate generating function 
for the number of occurrences of i? in a random text according to the Bernoulli model. 

1 Construct the marked automaton for R. 

2 Return the generating function F{piz, . . . ,prZ,u) of the corresponding language, as given 
by the Chomsky-Schiitzenberger Algorithm. 

The proof of Th. 1 in the Markov model follows along similar lines. It is based on 
an automaton that keeps track of the letter most recently read. 

Algorithm 

6. [Markov automaton] Input: A DFA A over an alphabet A. Output: A DFA over the alphabet 
(fo + A)^, where £o ^ A. For each word wi • • • w„ recognized by A, this DFA recognizes the 
word (£o,wi)(wi,W 2 ) ■ ■ ■ (w„-i,w„). 

1 Duplicate the states of A until there are only input transitions with the same letter for each 
state. Let (Q, s, F, A, d) be the resulting automaton. 

2 Define a transition function A : Q x {£q + A)^ Qhy A{S{q,£), {£,£')) = S{5{q, £),£') 
for all g 6 Q \ {«}, andf, f' € A; and A{S{s, £), {£o,£)) = d(s, £) for all f 6 A. 

3 Return (Q, s, F, {£q + A)^, A). 

This construction then gives access to the bivariate generating function. 
Algorithm 

7. [Number of matches — Markov] Input: A regular expression R over an alphabet A, the 
probabilities qij of transition from letter £i to £j and the probabilities qoj of starting with letter £j 
for all £i,£j G A. Output: The bivariate generating function for the number of occurrences of R 
in a random text according to the Markov model. 

1 Apply the algorithm “Marked automaton” with “Markov automaton” as an extra step between 
steps 1 and 2. 

2 Return the generating function F{qoiz , . . . , qrrZ, u) of the corresponding language. 

This concludes the description of the algorithmic chain, hence the proof of Th. 1, as 

regards the bivariate generating function P{z, u) at least. The other generating functions 
then derive from P in a simple manner. □ 



4 Limiting Distribution 

In this section, we establish the limiting behaviour of the probability distribution of the 
number of occurrences of a regular expression P in a random text of length n and prove 
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that it is asymptotically Gaussian, thereby establishing Th. 2. Although this fact could 
be alternatively deduced from limit theorems for Markov chains, the approach we adopt 
has the advantage of fitting nicely with the computational approach of the present paper. 
In this abstract, only a sketch of the proof is provided. 

Streamlined proof. The strategy of proof is based on a general technique of singularity pertur- 
bation, as explained in [11], to which we refer for details. This technique relies on an analysis 
of the bivariate generating function P(z, u). The analysis reduces to establishing that in a fixed 
neighbourhood of u = 1, P(z, u) behaves as 



c(m) 

1 — z\{u) 



+ g[z,u), 



( 1 ) 



with c(l) f 0, c{u) and A(m) analytic in the neighbourhood of u = 1 and g{z,u) analytic 
in | 2 | < d for some S > 1/A(1) independent of u. Indeed, if this is granted, there follows 



[ 2 "]P( 2 ,u) = c(u)A(u)"(l + 0(A")), (2) 



for some A < 1. The last equation says that X„ has a generating function that closely resembles a 
large power of a fixed function, that is, the probability generating function of a sum of independent 
random variables. Thus, we are close to a case of application of the central limit theorem and of 
Levy’s continuity theorem for characteristic functions [7] . This part of our treatment is in line with 
the pioneering works [3,5] concerning limit distributions in combinatorics. Technically, under the 
“variability condition”, namely A"(l) + A'(l) — A'(l)^ f 0, we may conveniently appeal to 
the quasi-powers theorem [16] that condenses the consequences drawn from analyticity and the 
Berry-Esseen inequalities. This implies convergence to the Gaussian law with speed 0{1/ yfi), 
the expectation and the variance being 

E(X„) = nA'(l) -t- Cl + 0(A"), Var(X„) = n(A"(l) -f A'(l) - A'(l)") -f C 2 + 0(A"), 
Cl = c'(l),C2 = c"(l) + c'(l) - c'(l)^. 



Linear structure. We now turn to the analysis leading to (1). Let A be the automaton recogni- 
zing S* R and let m be its number of states. In accordance with the developments of §3, the matrix 
equation computed by Algorithm 3 for the generating functions can be written L = zTqL e, 
where e is a vector whose ith entry is 1 if state i is final and zero otherwise. The matrix To 
is a stochastic matrix (i.e., the entries in each of its lines add up to 1). The entry in To 
for i,j 6 m}, is the probability of reaching state j from state i of the automaton in 

one step. In the overlapping case, the constmction of Algorithm 5 produces a system equivalent 
toL = zTo diag((/>i)L + 1, 6 {1, u}, where 1 is a vector of ones since all the states of the new 

automaton are final, and fi = u when state i of A is final, and 1 otherwise. In the non-overlapping 
case, the system has the same shape; the transitions from the final states are the same as the transi- 
tions from the start state, which is obtained by replacing the rows corresponding to the final state 
by that corresponding to the start state. 

Thus, up to a renumbering of states, the generating function P{z, u) is obtained as the first 
component of the vector L in the vector equation 

L = zT{u)L + l, (3) 

with T(m) = To diag(l, u), the number of u’s being the number of final states 
of A. Eq. (3) implies P( 2 , u) = (1,0, ...,0)L = P( 2 , u)/det(/ — 2 T(m)), for some poly- 
nomial B{z,u), where I denotes the m x m identity matrix. The matrix T(u) is called the 
fundamental matrix of the pattern R. 
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Perron-Frobenius properties. One can resort to results on matrices with nonnegative entries [ 1 2,2 1 ] 
to obtain precise information on the location of the eigenvalue of T{u) of largest modulus. Such 
eigenvalues determine dominant asymptotic behaviours and in particular they condition (1). 

The Perron-Frobenius theorem states that if the matrix T{u) (u > 0) is irreducible and 
additionally primitive, then it has a unique eigenvalue A(u) of largest modulus, which is real 
positive. (For an m x m-matrix A, irreducibility mean that {I + A)'^ ;§> 0 and primitivity means 

;i> 0 , for some e, where X ;|> 0 iff all the entries of X are positive.) In the context of automata, 
irreducibility means that from any state, any other state can be reached (possibly in several steps); 
primitivity means that there is a large enough e such that for any pair {i,j) of states, the probability 
of reaching j from i in exactly e steps is positive. (Clearly, primitivity implies irreducibility.) In 
the irreducible case, if the matrix is not primitive, then there is a periodicity phenomenon and an 
integer k < m such that T{u)’° is “primitive by blocks”. Irreducibility and primitivity are easily 
tested algorithmically. 

Gaussian distribution. Consider the characteristic polynomial of the fundamental matrix, Q{\) = 
Q(A, u) = det(A/ — T{u)), where T{u) is assumed to be primitive. By the Perron-Frobenius 
theorem, for each u > 0, there exists a unique root A(u) of Q(A) of maximal modulus that is a 
positive real number. The polynomial Q has roots that are algebraic in u and therefore continuous. 
Uniqueness of the largest eigenvalue of T(u) then implies that A(m) is continuous and is actually 
an algebraic function of m for u > 0. Thus there exists a e > 0 and 'rji > 'rj 2 two real numbers 
such that for u in a neighbourhood (1 — e, 1 -f e) of 1, \{u) > rji > rj 2 > |m(w)|, for any other 
eigenvalue p{u). 

The preceding discussion shows that in the neighbourhood u € (1 — e, 1 -f e). 



P{z,u) 



u) 

Ai-"‘(u)Q'(A(u))(l - zX{u)) ^ 



where g is analytic in z with radius of convergence at least 1 /-g 2 . This proves (1). Then, the residue 
theorem applied to the integral I„{u) = ^ P{z, u)dzj z^^^ , where 7 is a circle around the 
origin of radius 5 = 2/(71 -f 72), yields (2). 

The variability condition is now derived by adapting an argument in [30] relative to analytic 
dynamic sources in information theory, which reduces in our case to using the Cauchy-Schwartz 
inequality. For the L\ matrix norm, ||T(u)"|| is a polynomial in u with nonnegative coefficients. It 
follows that \\T"{uv) || < ||T"(u^) ||T”(u^) ||^'^^. Since for any matrix T, the modulus ofthe 

largest eigenvalue of T is lim„_> 00 ||T"||^^",weget A(uu) < A(u^)^''^A(u^)^''^,Vu, u > O.This 
inequality reads as a concavity property for := log A(e*): </>((x + 7)/2) < (</>(x) + </>(7))/2, 
foranyrealxand7.Ifthisinequalityisstrictinaneighbourhoodof0,then </)" < 0. (The case where 
</>”(0) = 0 is discarded since \{u) is nondecreasing.) Otherwise, if there exist x < 0 and y > 0 
such that the equality holds in the concavity relation for </>, then necessarily equality also holds 
in the interval (x, y) and </> is actually affine in this interval. This in turn implies \{u) = au^ for 
some real a and b and u in an interval containing 1 , and therefore equality holds for all u > 0 
from the Perron-Frobenius theorem as already discussed. Since A(l) = 1, necessarily a = 1. 
From the asymptotic behaviour (2) follows that b < 1. Now A being a root of Q(A), if A(m) = 
with b < 1, then b is a rational number p/q and the conjugates k = 1 , . . . , 7 — 1 are 

also solutions of Q( A), which contradicts the Perron-Frobenius theorem. Thus the only possibility 
for b is 1. Now, m is an eigenvalue of uT{l) and another property of nonnegative matrices [21, 
Th. 37.2.2] shows that the only way M can be an eigenvalue of T(u) is when T{u) = uT(l), which 
can happen only when all the states of the automaton are final, i.e., S* R = S* , or, equivalently 
e E R. This concludes the proof of Th. 2 in the Bernoulli case. 

Markov model. The Markov case requires a tensor product construction induced by Algorithms 6 
and 7. This gives rise again to a linear system that is amenable to singularity perturbation. The 
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condition ofprimitivity is again essential but it is for instance satisfied as soon as both the Markov 
model and the pattern automaton are primitive. (Details omitted in this abstract.) This discussion 
concludes the proof of Th. 2. □ 

We observe that the quantities given in the statement are easily computable. Indeed, from the 
characteristic polynomial Q of T{u), the quantities involved in the expectation and variance of 
the statement of Th. 2 are 



dQ 



dQ 

dx 



A"(l) = 



0 + 2A'(l)|^+A'(l)^|^ 



dQ 

ax 



We end this section with a brief discussion showing how the “degenerate” cases in which T ( 1 ) 
is not primitive are still reducible to the case when Th. 2 applies. 

Irreducibility. The first property we have used is the irreducibility of T ( 1 ) . It means that from any 
state of the automaton, any other state can be reached. In the non-overlapping case, this property 
is true except possibly for the start state, since after a final state each of the states following the 
start state can be reached. In the overlapping case, the property is not true in general, but since 
the generating function P(z, u) does not depend on the choice of automaton recognizing U* R, 
we can assume that the automaton is minimal (has the minimum number of states), and then the 
property becomes true after a finite number of steps by an argument we omit in this abstract. Thus 
in both cases, T(u) is either irreducible or decomposes as ( q xfu) ) where A(u) is irreducible 
and it can be checked that the largest eigenvalue arises from the 2l-block for u near 1 . It is thus 
sufficient to consider the irreducible case. 

Primitivity. When T(u) is not primitive, there is an integer k < m such that (u) is primitive. 
Thus our theorem applies to each of the variables X„ counting the number of matches of the 
regular expression P in a text of length kn + i for i = 0, k — 1. Then, the theorem still holds 
once n is restricted to any congruence class modulo k. 




Fig.l. The correlations between |P|, |D| and |P|, in logarithmic scales. 



5 Processing Generating Functions 

Once a bivariate generating function of probabilities has been obtained explicitly, several 
operations can be performed efficiently to retrieve information. 
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Fig. 2. Motifs with theoretical expectation 
E > 2. Each point corresponds to a motif 
with coordinates {E, O) plotted on a log-log 
scale. The two curves represent an approxi- 
mation of ±3 standard deviations. 




Fig. 4. Motifs with theoretical expectation 
E > 2: Histogram of the Z-scores Z = 

O-E 

a 




Fig. 3. Histograms of motifs with 1 (dark 
gray), 2 (medium gray) and 3 (white) 
observed matches. Coordinates: x = 
logiQ E,y = number of motifs. 




Fig. 5. Scanning Prodom with motif 
PS00013. Observed matches versus expec- 
tation. 



First, differentiating with respect to u and setting u = 1 yields univariate generating 
functions for the moments of the distribution as explained in §2. By construction, these 
generating functions are also rational. 

Fast coefficient extraction. The following algorithm is classical and can be found in [ 1 8] . 
It is implemented in the Maple package gf un [27]. 

Algorithm 

8. [Coefficient extraction] Input: a rational function f(z) = P{z)/Q{z) and an integer n. 
Output: Un = [ 2 "]/ ( 2 ) computed in O(logn) arithmetic operations. 

1 Extract the coefficient of 2 ” in Q{z)f{z) = P{z), which yields a linear recurrence with 
constant coefficients for the sequence u„. The order m of this recurrence is deg(Q). 

2 Rewrite this recurrence as a linear recurrence of order 1 relating the vector I7„ = (m„, . . . , 
Un-m+i) to Un -1 by Un = AUn -1 whcrc A is a constant m xm matrix. 

3 Use binary powering to compute the power of A in Un = A"^'^Um- 

As an example, Fig. 6 displays the probability that the pattern ACAGAC occurs 
exactly twice in a text over the alphabet {A,C,G,T } against the length n of the text. The 
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Fig. 6. Probability of two occurrences of ACAGAC in a text of length up to 20,000 



probabilities assigned to each of the letters are taken from a viral DNA (<f>X174:). The 
shape of the curve is typical of that expected in the non-asymptotic regime. 
Asymptotics of the coefficients of a rational function can be obtained directly. Since the 
recurrence satisfied by the coefficients is linear with constant coefficients, a solution can 
be found in the form of an exponential polynomial: Un = pi(n)A" + • • • + Pk(n)X^, 
where the ’s are roots of the polynomial z™'Q{l / z) and the Pi ’s are polynomials. An 
asymptotic expression follows from sorting the \i ’s by decreasing modulus. When the 
degree of Q is large, it is possible to avoid part of the computation, this is described 
in [13]. 

The exponential polynomial form explains the important numerical instability of the 
computation when the largest eigenvalue of the matrix (corresponding to the largest A) 
is 1 , which Th. 2 shows to be the case in applications : if the probabilities of the transitions 
do not add up exactly to 1 , this error is magnified exponentially when computing moments 
for large values of n. This is another motivation for using computer algebra in such 
applications, and, indeed, numerical stability problems problems are encountered by 
colleagues working with conventional programming languages. 

The solution of linear systems is the bottleneck of our algorithmic chain. In the 
special case when one is interested only in expectation and variance of the number of 
occurrences of a pattern, it is possible to save time by computing only the local behaviour 
of the generating function. This leads to the following algorithm for the expectation, the 
variance is similar. 

Algorithm 

9. [Asymptotic Expectation] Input: the bivariate system (7 — zT{u))L — 1 = 0 from (3). 
Output: first two terms of the asymptotic behaviour of the expectation of the number of occurrences 
of the corresponding regular expression. 

1 LetAi=T(l),Ao = 7-T(l),C'o = -£(l). 

2 Solve the system AqXi + al = —Co, whence a value for a and a line Xi + /?1 for Xi . 

3 Solve the system A 0 X 2 + pi = Co — AiXi for j3. The expectation is asymptotically 
E = an + a — X + 0(A") for some A < 1 and x the coordinate of Ai corresponding to 
the start state of the automaton. 

Algorithm 9 reduces the computation of asymptotic expectation to the solution of a 
few linear systems with constant entries instead of one linear system with polynomial 
entries. This leads to a significant speed-up of the computation. Moreover, with due care. 
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the systems could be solved using floating-point arithmetic. (This last improvement will 
be tested in the future; the current implementation relies on safe rational arithmetics.) 

As can be seen from the exponential polynomial form a nice feature of the expansion 
of the expectation to two terms is that the remainder is exponentially small. 

6 Implementation 

The theory underlying the present paper has been implemented principally as a collec- 
tion of routines in the Maple computer algebra system. Currently, only the Bernoulli 
model and the non-overlapping case have been implemented. The implementation is 
based mainly on the package comb struct (developed at Inria and a component of 
the Maple V.5 standard distribution) devoted to general manipulations of combinatorial 
speciflcations and generating functions. Use is also made of the companion Maple li- 
brary gf un which provides various procedures to deal with generating functions and 
recurrences. About 1100 lines of dedicated Maple routines have been developed by one 
of us (P. N.) on top of combstruct and gf un 

This raw analysis chain does not include optimizations and it has been assembled 
with the sole purpose of testing the methodology we propose. It has been tested on a 
collection of 1 1 18 patterns described below and whose processing took about 10 hours 
when distributed over 10 workstations. The computation necessitates an average of 

6 minutes per pattern, but this average is driven up by a few very complex patterns. In 
fact, the median of the exeeution times is only 8 seconds. 

There are two main steps in the computation: construction of the automaton and 
asymptotic computation of expectation and variance. Let R be the pattern, D the Unite 
automaton, and T the arithmetic complexity of the underlying linear algebra algorithms. 
Then, the general bounds available are: |i?| < \D\ < 2l^l, T = 0{\Df), as results 
from the previous sections. (Sizes of R and D are defined as number of states of the 
corresponding NFA or DFA.) Thus, the driving parameter is |I?| and, eventually, the 
computationally intensive phase is due to linear algebra. In practice, the exponential 
upper bound on \D\ appear to be extremely pessimistic. Statistical analysis of the 1118 
experiments indicates that the automaton is constructed in time slightly worse than linear 
in |D| and that |D| is almost always between |i?| and The time taken by the second 
step behaves roughly quadratically (in 0{\D\‘^)), which demonstrates that the sparseness 
of the system is properly handled by our program. For most of the patterns, the overall 
“pragmatic” complexity thus lies somewhere around |i?|^ or |i?|^ (Fig. 1). 

7 Experimentation 

We now discuss a small campaign of experiments conducted on Prosite motifs intended 
to test the soundness of the methodological approach of this paper. No immediate bio- 
logical relevance is implied. Rather, our aim is to check whether the various quantities 
computed do appear to have statistical relevance. 

Combstruct and gf un are available at http : / /algo . inria . f r/libraries . The 

motif-speeifie procedures are to be found at 

http : / /www . dkf z . de/tbi /people /nicodeme. 
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The biological target database, the “text”, is built from the consensus sequences of 
the multi-alignments of Prodom34.2. This database has 6.75 million positions, each 
occupied by one of 20 aminoacids, so that it is long enough to provide matches for rare 
motifs. Discarding a few motifs constrained to occur at the beginning or at the end of a 
sequence (a question that we do not address here) leaves 1260 unconstrained motifs. For 
1118 of these motifs (about 88% of the total) our implementation produces complete 
results. With the current time-out parameter, the largest automaton treated has 946 states. 
It is on this set of 1 1 18 motifs that our experiments have been conducted. 

For each motif, the computer algebra tools of the previous section have been used to 
compute exactly the (theoretical) expectation E and standard deviation o of the statistics 
of number of matches. The letter frequencies that we use in the mathematical and the 
computational model are the empirical frequencies in the database. Each theoretical 
expectation E is then compared to the corresponding number of observed matches (also 
called observables), denoted by O, that is obtained by a straight scan of the 6.75 million 
position Prodom data base^. 

Expectations. First, we discuss expectations E versus observables O. For our reference 
list of 1 1 1 8 motifs, the theoretical expectations E range from 10^^^ to 10® . The observed 
occurrences O range from 0 to 100,934, with a median at 1, while 0 is observed in about 
12% of cases. Globally, we thus have a collection of motifs with fairly low expected 
occurrence numbers, though a few do have high expected occurrences. Consider a motif 
to be “frequent” if E > 2. Fig. 2 is our main figure: it displays in log-log scale points 
that represent the 71 pairs (£1, O') for the frequent motifs, E > 2. The figure shows a 
good agreement between the orders of growths of predicted E and observed O values: 
(i) the average value of log^^Q 0/ log^^g E is 1.23 for these 71 motifs; fi) the two curves 
representing 3 standard deviations enclose most of the data. 

Fig. 3 focusses on the classes of motifs observed O = 1,2,3 times in Prodom. For 
each such class, a histogram of the frequency of observation versus log^g E is displayed. 
These histograms illustrate the fact that some motifs with very small expectation are still 
observed in the database. However, there is a clear tendency for motifs with smaller 
(computed) expectations E to occur less often: for instance, no motif whose expectation 
is less than 10^® occurs 3 times. 

Z-scores. Another way to quantify the discrepancy between the expected and the ob- 
served is by means of the Z-score that is defined as Z = (O — E'jja. Histograms of 
the Z-scores for the frequent motifs (E > 2) should converge to a Gaussian curve if 
the Bernoulli model would apply strictly and if there would be a sufficient number of 
data corresponding to large values of E. None of these conditions is satisfied here, but 
nonetheless, the histogram displays a sharply peaked profile tempered by a small number 
of exceptional points. 

Standard deviations. We now turn to a curious property of the Bernoulli model regarding 
standard deviations. At this stage this appears to be a property of the model alone. It would 
be of interest to know whether it says something meaningful about the way occurrences 
tend to fluctuate in a large number of observations. 



^ The observed quantities were determined by the Prosite tools eontained in the IRSEC motif 
toolbox http : / /www. isrec . isb-sib . ch/ ftp- server / . 




208 



P. Nicodeme, B. Salvy, and P. Flajolet 



Theoretical calculations show that when the expectation of the length between two 
matches for a pattern is large, then o « \/E is an excellent approximation of the standard 
deviation. Strikingly enough, computation shows that for the 71 “frequent” patterns, we 
have 0.4944 < log((r)/ log(i7) < 0.4999. (Use has been made of this approximation 
when plotting (rough) confidence intervals of 3 standard deviations in Fig. 2.) 



Table 1. Motifs with large Z-scores 



Index Pattern 


E 


O 




2 


S-G-x-G 


2149 


3302 


25 0.54 


4 


[RK](2)-x-[ST] 


11209 13575 


22 0.21 


13 


DERK(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C 


788 


2073 


46 1.63 


36 


[KR]-x(l,3)-[RKSAQ]-N-x(2)-[SAQ](2)-x-[RKTAENQ]-x-R-x-[RK] 


2.75 


37 


20 12.45 


190 


C-CPWHF-CPWR-C-H-CFYW 


25 


173 


29 5.86 


5 


[ST]-x-[RK] 


99171 90192 


-30 -0.09 



Discussion. The first blatant conclusion is that predictions (the expectation E) tend to 
underestimate systematically what is observed (O). This was to be expected since the 
Prosite patterns do have an a priori biological significance. A clearer discussion of this 
point can be illustrated by an analogy with words in a large corpus of natural language, 
such as observed with Altavista on the Web. The number of occurrences of a word such 
as ‘deoxyribonucleic’ is very large (about 7000) compared to the probability (perhaps 
10^ ^®) assigned to it in the Bernoulli model. Thus, predictions on the category ofpattems 
that contain long (hence unlikely) words that can occur in the corpus are expected to 
be gross underestimations. However, statistics for a pattern like “A ( any.word ) IS IN” 
(590,000 matches) are more likely to be realistic. This naive observation is consistent 
with the fact that Fig. 2 is more accurate for frequent patterns than for others, and it 
explains why we have restricted most of our discussion to patterns such that 77 > 2. In 
addition, we see that the scores computed are meaningful as regards orders of growth, 
at least. This is supported by the fact that log 0/ log E is about 1.23 (for the data of 
Fig. 2), and by the strongly peaked shape of Fig. 4. 

Finally we discuss the patterns that are “exceptional” according to some measure. The 
largest automaton computed has 946 states and represents the expression E*R for the 
motif PS00844 ([LIV]-x(3)-[GA]-x-[GSAIV]-R-[LIVCA]-D-[LIVMF](2)-x(7,9)-[LI]- 
x-E-[LIVA]-N-[STP]-x-P-[GA]). Expectation for this motif is 1.87 X 10“®, standard- 
deviation 0.00136, while 0 = 0. This automaton corresponds to a finite set of patterns 
whose cardinality is about 1.9 X 10^®. The pattern with largest expectation is PS0006 
([ST]-x(2)-[DE]) for which E = 104633 (and O = 100934) and the renewal time 
between two occurrences is as low as 64 positions. The motifs with very exceptional 
behaviours jZj > 19 are listed in Table 1. The motif PS00005 ([ST]-x-[RK]) is the only 
motif that is clearly observed significantly less than expected. 

We plot in Fig. 5 the number of observed and expected matches of PS00013 against 
the number of characters of Prodom that have been scanned. The systematic deviation 
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from what is expected is the type of indication on the possible biological significance of 
this motif that our approach can give. 



8 Directions for Future Research 

There are several directions for further study: advancing the study of the Markov model; 
enlarging the class of problems in this range that are guaranteed to lead to Gaussian 
laws; conducting sensitivity analysis of Bernoulli or Markov models. We briefly address 
each question in turn. 

The Markov model. Although the Markov model on letters is in principle analytically 
and computationally tractable, the brute-force method given by algorithm “Markov au- 
tomaton” probably leaves room for improvements. We wish to avoid having to deal with 
finite-state models of size the product |i7| x \Q\, with |i7| the alphabet cardinality and 
|Q| the number of states of the automaton. This issue appears to be closely related to the 
areas of Markov chain decomposability and of Markov modulated models. 

Gaussian Laws. Our main theoretical result, Th. 2, is of wide applicability in all situa- 
tions where the regular expression under consideration is “nondegenerate”. Roughly, as 
explained in §4, the overwhelming majority of regular expression patterns of interest 
in biological applications are expected to be nondegenerate. (Such is for instance the 
case for all the motifs that we have processed.) Additional work is called for regarding 
sufficient structural conditions for nondegeneracy in the case of Markov models. It is at 
any rate the case that the conditions of Th. 2 can be tested easily in any specific instance. 
Model sensitivity and robustness. An inspection of Table 1 suggests that the exceptional 
motifs in the classification of .Z-scores cover very different situations. While a ratio 
0/E of about 3 and an observable O that is > 2000 is certainly significant, some 
doubt may arise for other situations. For instance, is a diserepancy of 5% only on a 
motif that is observed about 10® times equally meaningful? To answer this question 
it would be useful fo investigate the way in which small changes in probabilities may 
affect predictions regarding pattern occurrences. Our algebraic approach supported by 
symbolic computation algorithms constitutes an ideal framework for investigating model 
sensitivity, that is, the way predictions are affected by small changes in letter or transition 
probabilities. 
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Abstract. One of the most important open problems in computational molecular 
biology is the prediction of the conformation of a protein based on its amino acid 
sequence. In this paper, we design approximation algorithms for structure predic- 
tion in the so-called HP side chain model. The major drawback of the standard 
HP side chain model is the bipartiteness of the cubic lattice. To eliminate this 
drawback, we introduce the extended cubic lattice which extends the cubic lattice 
by diagonals in the plane. For this lattice, we present two linear algorithms with 
approximation ratios of 59/70 and 37/42, respectively. The second algorithm 
is designed for a ‘natural’ subclass of proteins, which covers more than 99.5% 
of all sequenced proteins. This is the first time that a protein structure predic- 
tion algorithm is designed for a ‘natural’ subclass of all combinatorially possible 
sequences. 



1 Introduction 

One of the most important open problems in molecular biology is the prediction of the 
spatial conformation of a protein from its sequence of amino acids. The classical methods 
for structure analysis of proteins are X-ray crystallography and NMR-spectroscopy. 
Unfortunately, these techniques are too slow and complex for a structure analysis of 
a large number of proteins. On the other hand, due to the technological progress, the 
sequencing of proteins is relatively fast, simple, and inexpensive. Therefore, it becomes 
more and more important to develop efficient algorithms for determining the 3-dimen- 
sional structure of a protein based on its sequence of amino acids. 

1.1 Protein Folding and the HP Model 

A protein is a linear chain of amino acids linked together by peptide bonds. An amino 
acid consists of a common main chain part and one of twenty residues, which determines 
its characteristic. The sequence of amino acids for a given protein is called its primary 
structure. Each natural protein folds into a unique spatial conformation called its tertiary 
structure. From the thermodynamic hypothesis it is assumed that the unique tertiary 
structure of a protein is the conformation with the minimal free energy. Experiments 
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have shown that the folding process in vitro is independent of external influence (hy 
folding in vivo sometimes helper-molecules called chaperones are involved). It seems 
that the tertiary structure of a protein is encoded in its primary structure. Under this 
hypothesis, the spatial conformation of a protein can be computationally determined 
from its sequence of amino acids. 

It is assumed that the hydrophobic ity of amino acids is the main force for the deve- 
lopment of a unique conformation. All natural proteins form one or more hydrophobic 
cores, i.e., the more hydrophobic amino acids are concentrated in compact cores whe- 
reas the more hydrophilic amino acids are located at the surface of the protein. This 
leads to a more simplified model, the so-called HP model (see, e.g.. Dill [4] and Dill 
et al. [5]). Here, we distinguish only between two types of amino acids: hydrophobic 
(or non-polar) and hydrophilic (or polar). Therefore, a protein is modeled as a string 
over {iT, P}, where each hydrophobic amino acid is represented by an H and each 
polar is represented by a P. In the following, a string in {H, P}* will also be called an 
HP-sequence. 

The 3-dimensional space will be discretized by a cubic lattice. More formally, let 
Ck, for keTN, be the following graph 



Ck = 






TZ'^ X 



\x — X 



\2 ^ 



where I I 2 is the usual Euclidean norm. Then C\ is the cubic lattice. A folding of a 
protein can be viewed as a self-avoiding path in the cubic lattice. More formally, a 
folding of an HP-sequence a=(Ji • • • is a one-to-one mapping (f>-.\i:n]^Ck such that 
\4>{i — I 2 <Vk for all i£ [2:n]. The score of a folding is the number of adjacent 

pairs of hydrophobic amino acids in the cubic lattice which are not adjacent in the 
given primary structure. Thus, the expected spatial conformation of a given protein is a 
folding with the largest score, since the negative score models the free energy. Therefore, 
a folding of a protein with a maximal score is called a conformation. 

The major disadvantage of the the HP model is the representation of the 3-dimen- 
sional space by a cubic lattice because it is a bipartite graph. Thus, two hydrophobic 
amino acids with an even distance in the protein cannot contribute to the score, since 
they cannot be adjacent in the cubic lattice. In particular, all foldings of the sequence 
(PP)" are optimal, although each folding on the cubic lattice has score 0. Hence, we are 
interested in a more natural discretization of the 3-dimensional space. In this paper, we 
consider the extended cubic lattice. In the extended cubic lattice we add to each lattice 
point 12 neighbors using diagonals in the plane, i.e., each lattice point has 1 8 neighbors. 
More formally, £2 is the mathematical description of the extended cubic lattice. Note 
that in £2 lattice points along a space diagonal are not connected. 

A natural extension of the HP model is the HP side chain model. This is a more 
realistic model where the residues will be explicitly represented. In terms of graph 
theory, a protein is modeled as a caterpillar graph instead of a linear chain. A caterpillar 
of length n is the following graph £=(PUP, E), where 



B — {&!, . . . , bn} , L — \fi, . . . , , 

E = {{biji) |i € [1 : n]} U {(6i-i,6i) |i e [2 : n]} . 
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Here, the set B represents the nodes in the backbone and L the so-called legs. A backbone 
node represents the a carbon atom together with the main chain part of the amino acid 
whereas the leg represents its characteristic residue. This is still a simplification, since 
the residue can be as simple as a hydrogen atom in Alanine and as complex as two 
aromatic rings in Tryptophan. Note that we only mark the legs as hydrophobic or polar. 
Hence, a backbone node cannot increase the score of a folding. 

1.2 Related Results 

It is widely believed that the computational task of predicting the spatial structure of a 
given polymer (or, in particular, a protein) requires exponential time. First evidence for 
this assumption has been established by proving that the prediction of the conformation 
of a polymer for some more or less realistic combinatorial models is AfP-hard (see, e.g., 
Ngo and Marks [11], Unger and Moult [15], and Fraenkel [6]). For a comprehensive 
discussion of these lower bounds, we refer the reader to the survey of Ngo, Marks, and 
Karplus [12]. 

In [13], Paterson and Przytycka show that for an extended HP model with an infinite 
number of different hydrophobic amino acids it is AfP-hard to determine the conforma- 
tion. In the extended HP model a protein will be modeled as a string over the (arbitrarily 
large) alphabet {P, . . .}. Here only pairs of adjacent hydrophobic amino 

acids of the same type (i.e., contacts of the form contribute to the score. Re- 

cently, Nayak, Sinclair, and Zwick [10] improved this result. Even for a constant (but 
quite large) number of different types of amino acids the problem remains AfP-hard. 
Moreover, they proved that this problem is hard to approximate by showing its MAXSNP- 
hardness. More recently, Crescenzi et al. [3] as well as Berger and Leighton [2] have 
shown independently that it is AfP-hard to determine the conformation the HP Model. 

On the other hand, there is also progress on positive results on protein structure pre- 
diction. As a first milestone. Hart and Istrail exhibit in [7,8] an approximation algorithm 
for protein folding reaching at least 3/8 of the optimal score in the HP model on the 
usual cubic lattice C\. In [9], the same authors present an approximation with a ratio of 
at least 2/^'m the HP side chain model on the cubic lattice. In [1], Agarwala et al. pre- 
sented an algorithm with an approximation ratio of 3/5 in the HP model on the so-called 
triangular lattice (also known as face centered cubic lattice). This was the first approach 
to investigate non-bipartite lattices. Although the triangular lattice is differently defined, 
it can be topologically viewed as a superset of Pi and a subset of P 2 ■ An extension of the 
cubic lattice by just one plane diagonal direction in all three 2-dimensional subspaces 
is topologically isomorphic to the triangular lattice. Thus, in the triangular lattice each 
lattice point has 12 neighbors. Later, Hart and Istrail constructed in [9] a 31/36 appro- 
ximation for the HP side chain model on triangular lattices. Note that the quality for all 
these approximation algorithms are measured with asymptotic approximation ratios. 
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1.3 Our Results 

In this paper, we investigate protein folding on extended cubic lattices. The extended 
cubic lattice is a natural extension of the cubic lattice which bypasses its major drawb- 
ack, its bipartiteness. First we present a general folding algorithm A which achieves for 
all protein sequences an approximation ratio of 59/70 («84.3%). Then we describe a 
special folding algorithm B which can be applied to a restricted subset of HP-sequences. 
With the second algorithm we obtain an approximation ratio of 37/42 («88.1%). Alt- 
hough it is difficult to compare the approximation ratios for protein structure prediction 
algorithms on different lattice models, it should be mentioned that this is the best known 
approximation ratio for such algorithms. 

Former protein structure prediction algorithms construct ‘layered’ foldings. This 
means that the algorithms constructs in reality a folding in the 2-dimensional sublattice 
from which the final folding in the 3-dimensional lattice will be generated. Therefore, 
only a few bonds use the third dimension. To obtain the high quality of the presented 
folding algorithm B, it is substantial to construct non-layered foldings in most parts of 
the conformation. Moreover, this eonstruction does not only depend on the distribution of 
the hydrophobic amino acids in the protein as former algorithms. It also strongly depends 
on the length of contiguous subsequences of polar residues. This is strong evidence that 
the predicted folding is not too artificial. 

On the other hand, this is the first time that folding algorithms for a ‘natural’ subclass 
of HP-sequences have been investigated. A strong indication that the considered subclass 
of HP-sequences is a ‘natural choice’ is the fact that more than 99.5% of all known 
sequences of proteins in the protein data base SWISS-PROT [ 1 6] belong to the eonsidered 
subclass. Finally, the running time of both approximation algorithms are linear. 



2 The General Folding Algorithm 

In this seetion, we present a general folding algorithm in the HP side chain model on 
extended cubic lattices. Let s=si • • • bean HP-sequence. A sequence of HP-sequences 

(fJi, . . . , am) is called a k-decomposition of s iff the following four conditions hold: 

1 . s — ai • • * 

2. =/c for alH G [2:m— 1], 

3. Q<\a\\jj <k ?ind\am\H 

4. the last symbol in each Ui is an H for all 1]. 

Here |s|j:^ is the number of H’s in the sequence s. The strings ai of a A: -decomposition 
(cti , . . . , am) are called k-fmgments. If \ai =k, we call a the canonical k-decompo- 
sition. 

Let s be an HP-sequence and let a={a \ , . . . , am) be the canonical 5 -decomposition 
of s. First we fold each ai as shown in Fig. 1. Here, the nodes on the backbone of the 
protein are drawn as circles. More precisely, a backbone node is drawn black if it repre- 
sents a hydrophobic amino acid and white otherwise. Hydrophobic residues are drawn as 
black squares, whereas the polar residues are not explicitly marked. The numbers in front 
of the squares represent the order of the hydrophobic residues in the sequence of amino 
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Fig. 1. Folding of a single 5-ffagment and arrangement to a pole 



acids. The contiguous block of polar amino acids between two hydrophobic amino acids 
are not connected in Fig. 1. From the numbering of the hydrophobic residues, it should 
be clear which strands have to be eonnected and in which way. 

We observe that for each 5-fragment consecutive backbone nodes with a hydrophobic 
residue are placed at neighbored lattice points, with the exception of the third and fourth 
backbone node. Therefore, the folding of the 5-fragment is still admissible even if the 
P-sequence between two hydrophobic residues is empty. If there is no polar residue 
between the third and fourth hydrophobic residue of a 5-fragment, we just remap the 
backbone node of the fourth hydrophobic residue one position up in the vertical direction. 

In what follows, we show how to combine this folding of 5-fragments to obtain 
a folding in the 3-dimensional space. Using the third dimension, we combine the 5- 
fragments to a pole of height m such that the corresponding hydrophobic residues form 
a vertical column. This will be achieved by arranging the layers in a zig-zag-style in the 
third dimension. This is sketched in Fig. 1 where only the hydrophobic residues are drawn 
explicitly as black circles. Note that at the front half of this pole the three hydrophobic 
residues have no neighbors outside the pole. Using a turn after m/2 layers, we combine 
the two halves to a new pole such that each layer contains 10 hydrophobic residues. A 
simple computation shows that each layer of 10 hydrophobic residues contributes 59 to 
the score: 23 H-H contacts within a layer and 36 H-H contacts to the two neighboring 
layers. 

Clearly, each lattice point has exaetly 18 neighbors. Thus, each hydrophobic residue 
can have at most 17 contacts with other hydrophobic neighbors. This upper bound on the 
number of hydrophobic neighbors of a hydrophobic residue can be improved as follows. 
We denote by a loss an edge in the lattice with the property that a hydrophobic residue 
is mapped to exactly one of its endpoints. 

Lemma 1. For each folding in the extended cubic lattice, a single hydrophobic residue 
is on average incident to at least 3 losses. 
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Proof. Consider a backbone vertex b^B and its adjacent hydrophobic residue £(zL. 
Assume that b and £ are mapped to adjacent lattice points p and q, respectively. There 
exists at least 6 lattice points for i€[l:6] such that is adjacent to both p and q. In the 
following, we consider a fix (but arbitrary) lattice point r^. Either a hydrophobic residue 
is assigned to Ti or not. In the first case, there is a loss along edge {p, r^}; in the latter 
case, there is a loss along edge {q, r^}. Since each loss along an edge is counted at most 
twice, each hydrophobic residue is on average incident to at least 3 losses. □ 

Note that in general a single hydrophobic residue can have 17 hydrophobic neighbors. 
But in this case the neighbors have 3 additional losses, implying that on average each 
hydrophobic residue has at least 3 losses. From the lemma follows that each hydrophobic 
residue can contribute to the score of a folding of at most — 7. Our construction 
together with the previous lemma leads to the following theorem. Note that we consider 
asymptotic approximation ratios in this paper. 

Theorem 2. Algorithm A constructs a folding in the HP side chain model on extended 
cubic lattices for an arbitrary HP-sequence with an approximation ratio of at least 
59/70 /«84.3%j. Moreover, this folding can be eomputed in linear time. 

3 The Improved Folding Algorithm 

In this section, we describe an improved folding algorithm B. This algorithm is designed 
for a special subset ofHP-sequences. Lets be an HP-sequence and let (7= (cJi, . . . ,cJm)be 
a 6-decomposition of s. Further, let ay=P^^H- ■ H be 6-fragment. We call a i, perfect 

iff there exists i€[2:6] such that £i=Q, or there exists such that fj-|-fj<3. An 

HP-sequence is called perfect if it has a 6-decomposition such that all its 6-fragments are 
perfect. If it has a 6-decomposition such that all but one of its 6-fragments are perfect, 
the HP-Sequence is called nearly perfect. The substrings for i<E [1:6] are called an fj- 
block at position i. For example, the 6-fragment a=P'^^ H P"^ H P^^ H P^‘^ H P^ H P^ H 
is perfect and has a 12-block at position 4. 

Again, we first describe how to fold a single 6-fragment. We will use two adjacent 
2-dimensional planes to achieve the folding. In each plane, we will place 3 hydrophobic 
residues. We distinguish three cases depending on whether the 6-fragment is perfect 
because of a 0-block at position greater than 1, a combination of a 0-block at position 1 
and a 3-block, or a combination of a 1- and a 2-block. 

Case 1: First, we assume that the 6-fragment is perfect because of a 0-block at 
position i>\. The folding is illustrated in Fig. 2. In Fig. 2a and 2b the foldings for 
a 6-fragment with a 0-block at position 2 and 3, respectively, are shown. The folding 
will be completed as illustrated in Fig. 2d. In Fig. 2c the first part of the folding of a 
6-fragment with a 0-block at position 4 is shown. This folding will be completed by a 
reverse traversal of the same folding given in Fig. 2c in the next layer. The case where the 
0-block is at position 5 or 6 is symmetric to the cases where the 0 block is at position 2 
or 1, respectively. In contrast to the folding in the previous section, the folding of a 
6-fragment consists of two layers with three hydrophobic residues each. In both layers 
the hydrophobic residues form a triangle. The narrow dotted horizontal lines in Fig. 2 
indicate where the 6-fragment will be folded to obtain this construction. 





d e 



Fig. 3. Case 2: Folding of a 6-ffagment with a 0-block at position 1 and a 3-block 

Case 2: Now we consider the case of a 0-block at position 1. The Figs. 3a, 3b, 3c, 
3d, and 3e illustrate the folding if the 3-block is at position 2, 3, 4, 5, and 6, respectively. 
Again, the narrow dotted horizontal line indicates where the folding will be folded to 
obtain two layers. The dashed lines indicates edges of the caterpillar which arise between 
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Fig. 4. Case 3.1: Folding of a 6-ffagment with a 1- and a 2-block 



adjacent layers. Note that we use here some area which will be usually used to connect 
the last hydrophobic amino acid of the previously considered fragment with the first 
hydrophobic residue of the actual fragment. In our construction, the used positions in 
the previous layer from the last visited hydrophobic residue are identical. Hence a reuse 
is possible and will not cause any difficulties. 

Case 3: Finally, we consider a combination of a 1- and a 2-block. Now we distinguish 
3 subcases depending on whether at position 0 there is a A: -block, a 1-block, or a 2-block 
for some k>2. 

Case 3.1: The folding will be constructed from the partial foldings of a 6-fragment 
given in Fig. 4 and Fig. 2d. Table 1 shows how to combine these partial foldings. The rows 
and columns refer to the positions of the 1- and 2-block, respectively. The superscript R 
indicates that the combined folding is traversed in reverse order. For example, the folding 
of a 6-fragment of a 1-block at position 2 and a 2-block at position 5 is the combination 
of the foldings given in Fig. 4a and Fig. 4e. Note that for the combination of Fig. 4a with 
Fig. 4f a minor modification of the folding given in Fig. 4a is necessary. The backbone 



Table 1. Combinations of subfoldings to a folding of a 6-fragment 



1- \ 2-block 


2 


3 
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5 


6 
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4g+2d 


4a+4f 
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— 




220 



V. Heun 



node of the hydrophobic residue labeled with 3 has to be remapped just to the right of 
the hydrophobic residue which is obviously possible. 

Case 3.2: Now we consider the case that the 1-block is at position 1 in the 6-fragment. 
Fig. 5a illustrates the folding if the 2-block is at position 2. The folding for a 2-block at 
position 3 is obtained by a combination of the foldings given in Fig. 5b and Fig. 2d. If 
the 2-block is at position 5 or 6, the folding will be combined from the foldings given 
in Fig. 5c and Fig. 4e or Fig. 4d, respectively. If the 2-bIock is at position 4, the folding 
is more complex and is illustrated in Fig. 5d. Here, the dotted curves indicate connected 
subsequences of polar residues. Observe that the order of the traversed six hydrophobic 
residues is different from that in the other foldings. The last visited node is directly above 
the fourth visited node of this fragment instead of the first one. 

Case 3.3: It remains the case where the 2-block is at position 1 in the 6-fragment. 
These are the most complex foldings and they are explicitly illustrated in Figs. 6a through 
6e depending on the position of the 1-block. 

Note that all foldings are drawn for the case that the subsequences of contiguous 
polar residues may be arbitrarily long. But nevertheless our construction is also valid for 
any length of subsequences of contiguous polar residues with some minor modifications. 

It remains to construct a complete folding based on the presented foldings of the 
6-fragments. First we combine the foldings of the 6-fragments to a long pole and break 
it into 4 parts Pi , . . . , P4 of equal height. Then the four parts will be arranged as shown 
in Fig. 7. In Fig. 7 only the hydrophobic residues are represented by gray quarters of a 
cylinder. For example, a folding of a single 6-fragment is illustrated in this figure by six 
black circles. The connections between these four quarters are drawn as dashed curves. 
In the final folding, each layer consists of 12 hydrophobic residues. Each layer of 12 
hydrophobic residues contributes 74 to the general score: 30 H-H contacts within a 




Fig. 5. Case 3.2: Folding of a 6-fragment with a 1-bloek at position 1 and a 2-bloek 
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Theorem 3. Algorithm B constructs a folding in the HP side chain model on extended 
cubic lattices for perfect HP-sequences with an approximation ratio of at least 37 / 42 
(^88, 1%). Moreover, this folding can be computed in linear time. 

It is possible to extend this embedding for nearly perfect HP-sequences. 

Theorem 4. Algorithm B constructs a folding in the HP side chain model on extended 
cubic lattices for nearly perfect HP-sequences with an approximation ratio of at least 
81/42 (7s^88, 1%). Moreover, this folding can be computed in linear time. 

An inspection of the protein database SWISS-PROT [16] shows that more than 97.5% 
of all stored proteins have a perfect 6-decomposition and more than 99.5% have a nearly 
perfect 6-decomposition. Thus, algorithm B is applicable to nearly all natural proteins. In 
our analysis, we marked the amino acids Ala, Cys, Phe, He, Leu, Met,Val, Trp, and Tyr as 
hydrophobic and all other amino acids as polar. This classification follows Sun et al. [14] 
and is a conservative classification in the sense that other classifications mark more amino 
acids as hydrophobic. Obviously, the more amino acids are marked as hydrophobic the 
more proteins have a (nearly) perfect HP-sequence. The detailed analysis of amino acids 
in SWISS-PROT 36 as of July 1998 can be found in Table 2. Here, N{i) is the number 
of amino acids which have a optimal 6-decomposition with i imperfect 6-fragments. 
An optimal /c-decomposition is a /c-decomposition with a minimal number of imperfect 
/c-fragments. 
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Abstract. The eonstruetion of full-text indexes on very large text eollections is 
nowadays a hot problem. The suffix array [ 1 6] is one of the most attractive full-text 
indexing data structures due to its simplicity, space efficiency and powerfiil/fast 
search operations supported. In this paper we analyze theoretically and experimen- 
tally, the I/O-complexity and the working space of six algorithms for constructing 
large suffix arrays. Additionally, we design anew external-memory algorithm that 
follows the basic philosophy underlying the algorithm in [ 1 3] but in a significantly 
different manner, thus combining its good practical qualities with efficient worst- 
case performances. At the best of our knowledge, this is the first study which 
provides a wide spectrum of possible approaches to the construction of suffix ar- 
rays in external memory, and thus it should be helpful to anyone who is interested 
in building ffill-text indexes on very large text collections. 



1 Introduction 

Full-text indexes — like suffix trees [17], suffix arrays [16] (cfr. PAT-arrays [13]), PAT- 
trees [13] and String B-trees [12], just to cite a few — have been designed to deal with 
arbitrary (unstructured) texts and to support powerful string-search queries (cfr. word- 
indexes [8]). They have been successfolly applied to fundamental string-matching pro- 
blems as well text compression, analysis of genetic sequences and recently to the in- 
dexing of special linguistic texts [11]. The most important complexity measures for 
evaluating their efficiency are [24]: (i) the time and the extra space required to build 
the index, (ii) the time required to search for a string, and (iii) the space used to store 
the index. Points (ii) and (iii) have been largely studied in the scientific literature (see 
e.g. [5,12,13,16,17]). In this paper, we will investigate the efficient construction of these 
data structures on very large text collections. This is nowadays a hot topic * because 
the construction phase may be a bottleneck that can even prevent these indexing tools 
to be used in large-scale applications. In fact, known construction algorithms are very 
fast when employed on textual data that fit in the internal memory of computers [3,16] 
but their performance immediately degenerates when the text size becomes so large that 

* Part of this work was done while the second author had a Post-Doctoral fellowship at the Max- 
Planek-Institut fur Informatik, Saarbriieken, Germany. The work has been supported by EU 
ESPRIT LTR Project N. 20244 (ALCOM-IT) 

* Zobel et at. [24] say that: “We have seen many papers in which the index simply *is*, without 
discussion of how it was created. But for an indexing scheme to be useful it must be possible 
for the index to be constructed in a reasonable amount of time, ”. 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 224-235, 1999. 

© Springer- Verlag Berlin Heidelberg 1999 
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the texts must be arranged on (slow) external storage devices [5,12], These algorithms 
suffer from the so called I/O bottleneck'. They spend most of the time in moving data 
to/from the disk. 

To study the efficiency of algorithms that operate on very large text collections, 
we refer to the classical Parallel Disk Model [21,23]. Here a computer is abstracted 
to consist of a two-level memory, a fast and small internal memory, of size M, and 
a slow and arbitrarily large external memory, called disk. Data between the internal 
memory and the disk are transfered in blocks of size B (called disk pages). It is well 
known [20,24] that accessing one page from the disk decreases the cost of accessing 
the page succeeding it, so that “bulk” I/Os are less expensive per page than “random” 
I/Os. This difference becomes much more prominent if we also consider the reading- 
ahead/buffering/caching optimizations which are common in current disks and operating 
systems. To deal with these disk specialties we therefore adopt the simple accounting 
scheme introduced in [10]: Let c < 1 be a constant, a bulk I/O is the reading/writing of a 
contiguous sequence of cM jB disk pages; a random I/O is any single disk-page access 
which is not part of a bulk I/O. The performance of the external-memory algorithms 
is therefore evaluated by measuring: (a) the number of I/Os (bulk and random), (b) the 
internal running time (CPU time), and (c) the number of disk pages used during the 
construction process (working space). 

Previous Work. For simplicity of exposition, we use N to denote the size of the whole 
text collection and assume throughout the paper that it consists of only one long text. The 
most famous indexing data structure is the suffix tree. In internal memory, a suffix free 
can be constructed in 0{N) time [17,9]; in external memory, Farach et al. [10] showed 
that a suffix free can be opfimally consfrucfed wifhin fhe same I/O-bound as sorting N 
atomic items; nonetheless, known practical construction algorithms for external memory 
still operate in a brute-force manner requiring 0{N'^) total I/Os in the worst-case [5]. 
Their working space is not predictable in advance, since it depends on the text structure, 
and requires between 15A^ and 2bN bytes [15,16]. 

Since space occupancy is a crucial issue, Manber and Myers [16] proposed the suffix 
array data structure, which consists of an array of pointers to text positions and thus 
occupies overall 4A^ bytes. Suffix arrays can be efficienfly consfrucfed in 0{N log 2 N) 
time [16] and 0{{N / B){\og 2 N) log^/^ (N/B)) I/Os [1]. The motivation of the recent 
interest in suffix arrays has fo be found in fheir simplicify, reduced space occupancy 
and in the small constants hidden in the big-Oh notation, which make them suitable 
to index very-large text collections in practice. Suffix arrays also presenf some nafural 
advantages over the other data structures for what concerns the construction phase. 
Indeed, their simple topology (i.e., an array of pointers) avoids at construction time the 
problems related to the efficient management of tree-based data structures (like suffix 
trees and String B-trees) on external storage devices [14]. Furthermore, efficient practical 
procedures for building suffix arrays are definifively useful for efficienfly consfrucfing 
suffix frees, Sfring B-frees and fhe other full-text indexing data structures. 

Our Contribution. With the exception of some preliminary and partial experimental 
works [16,13,18], to the best of our knowledge, no full-range comparison exists among 
the known algorithms for building large suffix arrays. This will be fhe main goal of 
our paper, where we will theorefically study and experimentally analyze six suffix- 
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array construction algorithms. Some of them are the state-of-the-art in the praetieal 
setting [13], others are the most effieient theoretieal ones [16,1], whereas three other 
algorithms are our new proposals obtained either as slight variations of the previous 
ones or as a careful eombination of known teehniques which were previously employed 
only in the theoretical setting. In the design of the new algorithms we will address mainly 
two issues: (i) simple algorithmic structure, and (ii) reduced working space. The first 
issue has clearly an impact on the predictability and practical efficiency of the proposed 
algorithms. The second issue is important because the real disk size is limited and “space 
optimization is closely related to time optimization in a disk memory” [14][Sect. 6.5]. 

We will discuss all the algorithms according to these two resources and we will 
pose particular attention to differentiate between random and bulk 1/Os in our theo- 
retical analysis. This adopted accounting scheme allows to reasonably explain some 
interesting 1/0-phenomena which arise during the experiments and which would be 
otherwise meaningless in the light of other simpler external-memory models. As a re- 
sult, we will give a precise hierarchy of suffix-array construction algorithms according 
to their working-space vs. construction-time tradeoff; thus providing a wide spectrum 
of possible approaches for anyone who is interested in building large full-text indexes. 

The experimental results have finally driven us to deeply study the intriguing, and 
apparently counterintuitive, “contradiction” between the effective practical performance 
of one of the experimented algorithms, namely the algorithm in [ 13], and its unappealing 
(i.e., cubic) worst-case behavior. This study has lead us to devise a new construction 
algorithm that follows the basic philosophy of [13] but in a significantly different manner, 
thus resulting in a novel approach which combines good practical qualities with efficient 
worst-case performances. 



2 The Suffix Array Data Structure 

The suffix array SA builf on a text T[l, N] is an array containing the lexicographically 
ordered sequence of suffixes of T, represented via pointers to their starting positions (i.e., 
integers). For instance, if T = ababc then SA = [1, 3, 2, 4, 5]. SA occupies AN bytes if 
N < 2^^. In this paper we consider three well-known algorithms for constructing suffix 
arrays, called Manber-Myers [16] (MM), BaezaYates-Gonnet-Snider [13] (BGS) and 
Doubling [ 1 ] , and we refer the reader to the corresponding literature for their algorithmic 
details, due to space limitations. We now concentrate on our three new proposals, describe 
their features (Section 2.1), evaluate their complexities (Table 1), and finally compare all 
of them via a set of experiments on two texts collections (Section 3). The last Section 4 
will be dedicated to the description of an improvement to the algorithm in [13] (called 
new BGS). 



2.1 Three New Algorithms 

The three proposed algorithms asymptotically improve the previously known 
ones [16,13,1] by offering better trade-offs between total number of I/Os and working 
space. Their algorithmic structure is simple because it is based only upon sorting and 
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Algorithm 


Working space 


CPU-time 


total number of I/Os 


MM [16] 

BGS [13] 

Doubl [1] 
Doubl+Disc 
Doubl+Disc+Radix 
Constr. L pieces 
New BGS 


8N 

8N 

2AN 

2AN 

12N 

8N 


iVlogj AT 
(A® log2 M) /M 
A(log2 Nf 
A(log2 Nf 
A(log2 Nf 
A(log2 Nf 
A^(log2 M)/M 


Nlog^N 
rd{N \og 2 M)/m 
n (log^ n) log 2 N 
n (log^ n) log 2 N 
n (log^/iogjv Af) log2 AT 
n (log,„ n) log 2 N 
[rS jni) 



Table 1. CPU-time and I/Os are expressed in big-Oh notation; the working space is evaluated 
exactly; L is an integer constant > 1; n = N/ B and m = M/B. BGS and new-BGS operate 
via sequential disk scans, whereas all the other algorithms mainly execute random I/Os. Notice 
that with a tricky implementation, the working space of BGS can be reduced to AN. The last four 
lines of the table indicate our new proposals. 



scanning routines. This feature has two immediate advantages: The algorithms are ex- 
pected to be fast in practice because they can benefit from the prefetching/caching of the 
disk; and they can be easily adapted to work efficiently on 17-disk arrays and clusters of 
P workstations by plugging in proper sorting/scanning routines [21,23] (cfr. [18]). 

Doubling combined with a discarding stage. The main idea of the doubling algo- 
rithm [1] is to assign names (i.e. small integers) to the power-of-two length substrings of 
T[l, N] in order to satisfy the so ealled lexicographic naming property. Given two text 
substrings a and j3 of length 2^, it is a <p /? if and only if the name of a is smaller than 
the name of j3. These names are computed inductively in [1] by exploiting the following 
observation: the lexicographic order between any two substrings of length 2^ can be 
obtained by exploiting the (inductively known) lexieographic order between their two 
(disjoint) substrings of length 2^^^. After q = 0(log2 N) stages, the order between any 
two suffixes of T, say T[i,N] and T[j, iV], can be derived in constant time by comparing 
the names of T[i,i + 2'^ — 1] and T[j, j + 2'^ — 1], 

Our first new algorithm is based on the observation that: In each stage h of the 
doubling approach, all the text substrings of length 2^ are considered although the final 
position in SA of some of their corresponding suffixes might be already known. Our 
main idea is therefore to identify and “discard” all those substrings from the naming 
process thus reducing the overall number of items ordered at each stage. However, this 
discarding step is not easy to be implemented because some of the discarded substrings 
might be necessary in next stages for computing the names of other longer substrings. 
In what follows, we describe how to cope with this problem. 

The algorithm inductively keeps two lists of tuples: FT (finished tuples) and UT 
(unfinished tuples). The former is a list of tuples (pos,— l,f) denoting the suffixes 
T[i, N] whose final position in SA is known: SA[pos\ = i. UT is a list of tuples {x, y, i) 
denoting the suffixes T[i, iV] whose final position is not yet known. Initially, UT contains 
the tuples {0,T[i],i), for 1 < i < N; FT is empty. The algorithm executes < log 2 N 
stages; each stage j consists of six steps: 

1. Sort the tuples in UT according to their first two components. If UT is empty go to 
step 6. 
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2. Scan UT, identify the “finished” tuples and assign new names to all tuples in UT. 
Formally, a tuple is considered “finished” if it is preceded and followed by two tuples 
which are different in at least one of their first two components; in this case, the algorithm 
sets the second component of this tuple to — 1. The new names for all tuples are computed 
differently from [1] by setting the first component of a tuple t = {x, y, *) equals to (x+c), 
where c is the number of tuples that precede t in UT and have the form {x, y' , *) with 

y' 7 ^ y- 

3. Sort UT according to the third component of its tuples. 

4. Merge the lists UT and FT according to the third component of their tuples. UT contains 
the final merged sequence, whereas FT is emptied. 

5. Scan UT and for each not-finished tuple t = {x, y, i) (i.e. y ^ —1), take the next 
tuple at distance 2-’ (say {x' , *,i + 2-’ )) and set t equal to (x, x' , i). If a tuple is marked 
“finished” (i.e., y = —1), then it is discarded from UT and put into FT. Finally, set 
j = j + l and go to step 2. 

6. Sort FT according to the first component of its tuples (UT is empty); and derive SA 
by reading rightwards the third component of the sorted tuples. 

The correctness follows from the invariant (proof in the full version): At a generic stage j 
and after step 2, we have that in any tuple t = {x, y, i) the parameter x denotes the 
number of text suffixes whose prefix of length 2-’ is strictly smaller than T[i, i + 2-’ — 1]. 
The algorithm has the same I/O-complexity as the Doubling algorithm (see Table 1), 
but we expect that the discarding step helps in improving its practical performance by 
reducing the overall number of tuples on which the algorithm is called to operate at 
each stage. In our implementation, we stuff four characters T[i,i + 4] into each tuple 
(instead of the single T[i\) when constructing UT, thus initially saving four sorting and 
four scanning steps. 

Doubling+Discard and Radix Heaps. Although the doubling technique gives the two 
most I/O-efficient algorithms for constructing large suffix arrays, it has the major dra- 
wback that its working space is large (i.e. 24A^ bytes) compared to the other known 
approaches (see Table 1). This is due to the fact that it uses external mergesort [14] to 
sort the list of tuples, and this algorithm requires an auxiliary array to store the interme- 
diate results (see Section 3). Our new idea is to reduce the overall working space using 
an external version of the radix heap data structure introduced in [2]. Radix heaps are 
space efficient but their I/O-performance degenerates when the maximum priority value 
is large. The new algorithm replaces the mergesort routine in steps 1 and 3 above with a 
sorting routine based on external radix heaps [6]. This reduces the overall required space 
to 12A^ bytes, but at the cost of increasing the I/O-complexity (see Table 1). We will 
experiment this algorithm on real data to check whether the reduction in the number of 
processed tuples, induced by the discarding strategy, compensates the time increase of 
the radix heap approach (see Section 3). 

Construction in L pieces. This algorithm improves over all the previous ones in terms 
of both I/O-complexity, CPU-time and working space. It constructs the suffix array into 
pieces of equal size and thus turns out useful either when the underlying application does 
not need the suffix array as a unique data structure, but allows to keep it in a distributed 
fashion [4]; or when we operate in a distributed-memory environment [18]. Unlike the 
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approaches in [18], our algorithm does not need the careful setting of system parameters. 
It is very simple and applies in a different way, useful for practieal purposes, a basic idea 
known so far only in the theoretical setting (see e.g. [9]). 

Let L be a positive integer parameter (to be fixed later), and assume that T is lo- 
gically padded with L blank characters. The algorithm constructs L suffix arrays, say 
SA\,SA 2 , . . . , SAl each of size N/L. Array SAi stores the lexicographieally ordered 
sequenee of suffixes {T[i, iV], T[i + L, N],T[i + 2L, A"], . . . , }. The logic underly- 
ing our new algorithm is to first eonstruct SA^, and then derive all the others arrays 
SAl- 1 , SAl- 2 , ■ ■ ■ , (in that order) by means of a simple algorithm for sorting 
triple of integers. 

SAl is built in two main stages: First, the string set S = {T[L, 2L — 1], T[2L, 3L — 
1], T[3L, 4L — 1], . . .} is formed and lexicographically sorted by means of any exter- 
nal string- sorting algorithm [1]; the compressed text T'[1,N / L] is then derived from 
T[L,N+L — 1] byreplacing each string T[iL, (f + 1)L — 1] with its ra«A:inthe sorted set 
S. Subsequently, any known eonstruetion algorithm is used to build the suffix array SA' 
of T'; and then SA l is derived by setting SA ^ [j] = SA' [j] x L. The other L — 1 suffix 
arrays are constructed by exploiting the observation: Any suffix T[i + kL, A] in SAi can 
be seen as the concatenation of the character T[i + kL] and the suffix T[i + l + kL,N], 
which actually occurs in SAi^\. It follows that given SAi^i, the construction of SAi 
can be reduced to the sorting of 0{N/L) triples (details in the full paper). 

Sorting the set S takes 0{Sort{N)) random I/Os and 2A + 8N/L bytes, where 
Sort{N) = {N/B) \ogfj / g{N / B) [1]. Building the L suffix arrays takes 0{Sort{N) 
log 2 A) random I/Os, 0{N/M log 2 (A/M)) bulk I/Os and 24A/T bytes. Of course the 
larger is L, the bigger is the number of suffix arrays to be eonstrueted, but the smaller 
is the working space required. By setting L = 4, we get an interesting trade-off: 6A 
working space, 0{Sort{N) log 2 A) random I/Os and 0{N/M log 2 (A/M)) bulk I/Os. 
The practical performance of this algorithm will be evaluated in Section 3. 

3 Experimental Results 

We implemented the algorithms above using a recently developed external-memory li- 
brary of algorithms and data structures called LEDA-SM [7] (an acronym for “LEDA 
for Secondary Memory”). ^ This library is an extension of the internal-memory libr- 
ary LEDA [19] and follows LEDAs main ideas: portability, effieiency and high level 
specification of data structures. The specialty of LEDA-SM’s data structures is that we 
can specify (and therefore control) the maximum amount of internal memory that they 
are allowed to use; furthermore we can count the number of I/Os performed. This way, 
library LEDA-SM allows the programmer to experimentally investigate how the model 
parameters M and B infiuenee the performance of an external-memory algorithm. For 
what concerns our experiments, we used the external-array data structure and the ex- 
ternal sorting/scanning algorithms provided by LEDA-SM; the other in-core algorithms 
and data structures are taken from LEDA. In particular, we used an implementation of 
external mergesort that needs 2A6 bytes for sorting X items of b bytes each. 

The eomputer used in our experiments is a SUN ULTRA-SPARC 1/143 with 64 Mby- 
tes of internal memory running the SUN Solaris 2.5. 1 operating system. It is conneeted 

^ For another interesting external-memory library see [22]. 




230 



A. Crauser and P. Ferragina 



to one single Seagate Elite-9 SCSI disk via a fast-wide differential SCSI controller 
(B = 8 Kbytes). According to the adopted accounting scheme (see Section 1), we have 
chosen bulk_size = 64 disk pages, for a total of 512 Kbytes. This way, the seek time 
is 15% of a bulk I/O and we achieve the 81% of the maximum transfer rate of our 
disk while keeping the service time of the requests still low. Of course, other values 
for the bulk_size might be chosen and experimented, thus achieving different trade- 
offs between random/bulk disk accesses. However, the qualitative considerations on the 
algorithmic performance drawn in the next section will remain mostly unchanged. 

For our experiments we collected over various WEB sites two textual datasets: the 
Reuters corpus^ of about 26 Mbytes; and a set of amino-acid sequences taken from a 
SWISSPROT database'* of about 26 Mbytes. These datasets have some nice features: 
the former set is structured and presents long repeated substrings, whereas the latter set 
is unstructured and thus suitable for full-text indexing. Notice that for N = 26 Mbytes 
the suffix array SA occupies 104 Mbyfes, and the working space of all tested algorithms 
is more than 200 Mbytes. Hence, the datasets are large enough to evaluate the I/O- 
performance of the studied algorithms, and investigate their scalability in an external- 
memory setting. The overall results are reported in Tables 2 and 3 . (For further comments 
and results we refer the reader to the full paper.) 

Results for the MM-algorithm. It is not astonishing to observe that the construction time 
of the MM-algorithm is outperformed by every other algorithm studied in this paper 
as soon as its working space exceeds the internal memory size. This worse behavior 
is due to the fact that the algorithm accesses the suffix array in an unstructured and 
unpredictable way. When N > 8 Mbytes, its time complexity is still quasi-linear but 
the constant hidden in the big-Oh notation is very large due to the paging activity, thus 
making the algorithmic behavior unacceptable. 

Results for the BGS-algorithm. If we double the text size, the running time and I/Os 
increase by nearly a factor of four. The number of total and bulk I/Os is nearly identical 
for all datasets, so that the practical behavior is actually quadratic. It is not astonishing to 
verify experimentally that BGS is the fastest algorithm for building a (unique) suffix array 
when N < 25 Mbytes. This scenario probably remains unchanged for text collections 
which are slightly larger than the ones we experimented in this paper; however, for larger 
and larger sizes, the quadratic behavior of BGS will be probably no longer “hidden ” 
by its nice algorithmic properties (i.e., sequential scans of disk, small hidden constants, 
etc. [13]). In Table 3 we notice that (i) only the 1% of all disk accesses are random 
I/Os; (ii) the algorithm performs the least number of random I/Os on both the datasets; 
(iii) BGS is the fastest algorithm to construct one unique suffix array, and it is the second 
fastest algorithm in general. Additionally we observe that its quadratic time complexity 
heavily influences its practical efficiency, so that disk-I/Os are not the only bottleneck 
for BGS. 

Results for the Doubling algorithm. The doubling algorithm performs 11 stages on the 
Reuters corpus, hence 21 scans and 21 sorting steps. Consequently, we can conclude 

* We used the text collection “Reuters-21578, Distribution 1.0” available from David D. Lewis’ 
professional home page, eurrently: http ; //www. resear ch. att . com/ '--'lewis 
See the site: http ; // www . bic . nus . edu . sg/ swprot . html 
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that there is a repeated substring in this text eollection of length about 2^^ (indeed, we 
detected a duplicated article). The Doubling algorithm scales well in the tested input 
range; however, due to the high number of random I/Os and to the large working space, 
we expect that this algorithm surpasses the performance of BGS only for very large values 
of N . Hence, although theoretically interesting and almost asymptotically optimal, the 
Doubling algorithm is not much appealing in practice; this motivated our development 
of Doubl+Disc and Doubl+Disc+Radix algorithms. 

Results for Doubl+Disc algorithm. The number of discarded tuples is nearly the same 
as the size of the test set increases, and the gain induced is approximately the 32% of the 
running time of doubling. In our experimental datasets, we save approximately 19% of 
the I/Os compared to Doubling. The percentage of random I/Os is 28%, which is much 
less than Doubling (42%), and drives us to conclude that discarding helps in reducing 
mainly the random I/Os. The saving induced by the discarding strategy is expected to pay 
much more on larger text collections, because of the significant reduction in the number 
of manipulated tuples, which should facilitate caching and prefetching operations. 

Results for Doubl+Disc+Radix algorithm. This algorithm is not as fast as we con- 
jectured. The reason is that we cannot fully exploit the good qualities of radix heaps 
by keeping the maximum priority value small. Step 2 in Doubl+Disc-algorithm must be 
implemented via two sorting steps and this naturally doubles the overall work. It is there- 
fore not surprising to observe in Table 3 that the Doubl+Disc+Radix algorithm performs 
twice the I/Os of the other Doubling variants, and it is the slowest among all the tested 
algorithms. Consequently, the “compensation” conjectured in Section 2.1 between the 
number of discarded tuples and the increase in the I/O-complexity of heap-sorting does 
not actually occur in practice. If we consider space vs. time trade-off, we can reasonably 
claim that Doubl+Disc+Radix is worse than BGS because the former requires larger 
working space and it is expected to surpass the BGS-performance only for very large 
text collections. 

Results for L-pieces algorithm. We fixed L = A, used multi-way mergesort for string- 
sorting and Doubling for constructing SA^. Looking at Table 3 we notice that 40% of 
the total I/Os are random, and that the present algorithm executes slightly more I/Os than 
BGS. Nonetheless, it is the fastest algorithm (see Table 2): It is three to four times faster 
than BGS (due to its quadratic CPU-time) and four times faster than the Doubl+Disc 
algorithm (due to the larger number of I/Os). The running time distributes as follows: 
63% is used to build SAl\ 4% to sort the set S'; the rest is used to build the other three 
suffix arrays. It must be said that for our test sizes, the short strings fit in internal memory 
at once thus making their sorting stage very fast. However, it is also clear that sorting 
short strings takes no more time than the one needed by one stage of the Doubl-algorithm. 
Consequently, it is not hazardous to conjecture that this algorithm is still significantly 
faster than all the other approaches when working on S and the SAi ’s entirely on disk. 
The only “limit” of this algorithm is that it constructs the suffix array in four disfinct 
pieces. If the underlying text-retrieval applications does not impose to have one unique 
suffix array [4] then this approach turns out to be de-facto ‘the’ choice for constructing 
such a data structure. 
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4 The New BGS-Algorithm 

We conclude our paper by addressing one further issue related to the intriguing, and 
apparently counterintuitive, “contradiction” between the effective practical performance 
of the BGS-algorithm and its unappealing (i.e., cubic) worst-case complexity. We briefly 
sketch (details in the full paper) a new approach that follows the basic philosophy 
underlying the BGS-design but in a significantly different manner, thus resulting in 
a novel algorithm which combines good practical qualities with efficient worst-case 
performance. 

Let us set m = £M, where £ < 1 is a positive constant to be fixed later. We 
divide the text T into k non-overlapping substrings of length m each, namely T = 
TkTk-i ■ ■ -T 2 Ti. The algorithm executes k = 0{N/M) stages (like BGS) and pro- 
cesses the text from the right to the left (unlike BGS). The following invariant is kept 
inductively before stage h starts: String S = Th-iTh -2 ■ ■ - Ti is the text part processed 
in the previous {h—1) stages. The algorithm has computed and stored on disk two data 
structures: The suffix array SAext of the string S and its “inverse ” array P os ext, which 
keeps at each entry PoSext[j] the position in SAext of the suffix IS*!]. After all k 

stages are executed, we have S = T and thus SA = SAext- 

The main idea underlying the leftward-scanning of the text is that when the h-th 
stage processes the text suffixes starting in Th, it has already accumulated into SAext 
and P os ext some informations about the text suffixes starting to the right of Th- This 
way, the comparison of the former text suffixes can be done by exploiting these two 
arrays, and thus using only localized information which eliminates the need of random 
I/Os (cfr. construction of SAint in BGS [13]). The next Lemma formalizes this intuition 
(proof in the full version): 

Lemma 1. A suffix T[i, N] starting into the text piece Th can be represented succinctly 
via the pair {T[i, i + m — 1], PoSext[{{i + m— l)modm) + 1]). Consequently, all text 
suffixes starting into Th can be represented using overall 0{m) space. 

Stage h preserves the invariant above and thus updates SAext and P os ext by properly 
inserting into them the “information” regarding the text suffixes of T which start in the 
text piece Th (currently processed). After that, the new SAext and PoSext will correctly 
refer to the “extended” string Th ■ S, thus preserving the invariant for the next (L + l)th 
stage (where S = Th ■ S = ThTh-i ■ ■ T 2 T 1 ). The algorithmic details are not obvious 
but due to space limitations we refer the interested reader to the full paper, where we 
will prove that: 

Theorem 2. The suffix array of a text T[l, can be constructed in 0{N‘^ /M'^) bulk- 
I/Os, no random-I/Os, and 8N disk space in the worst case. The overall CPU time is 
Oi^iog^M). 

The value of the parameter £ is properly chosen to tit the auxiliary data structures into 
internal memory. The practical behavior of the new-BGS algorithm is guaranteed on any 
indexed text independently of its structure, thus overcoming the (theoretical) limitations 
of the classical BGS [13] and still keeping its attractive practical properties. 




On Constructing Suffix Arrays in External Memory 233 



5 Conclusions 

It is often observed that praetitioners use algorithms which tend to be different from 
what is elaimed as optimal by theoreticians. This is doubtless because theoretical mo- 
dels tend to be simplifications of reality, and theoretical analysis need to use conservative 
assumptions. In the present paper we actually tried to “bridge” this difference by ana- 
lyzing more deeply some suffix-array constraction algorithms taking more into account 
the specialties of current disk systems, without going into much technological details 
but still referring to an (abstract) I/O-model. As it appears clear from the experiments, 
the final choice of the “best” algorithm depends on the available disk space, on the disk 
characteristics (which induce different trade-offs between random and bulk I/Os), on 
the structural features of the indexed text, and also on the patience of the user to wait 
for the completion of the suffix-array construction. However, it must be noticed that the 
running-time evaluations indicated in our tables and pictures are not clearly intended 
to be definitive. Algorithmic engineering and software tuning of the C-f-f-code might 
definitively lead to improvements without anyway changing the features of the expe- 
rimented algorithms, and therefore without affecting significantly the scenario that we 
have depicted in these pages. The qualitative analysis developed in the previous sections 
should, in our opinion, route and clarify fo the software developers which algorithm best 
fits their needs. 

The results in this paper suggest some other directions of research that deserve further 
investigation. The most notable one is, in our opinion, an adaptation and simplification 
of the I/O-optimal algorithm for building suffix trees [10] to the I/O-optimal {direct) 
construction of suffix arrays. 
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Table 2. Construction time (in seconds) of all experimented algorithms on the two text collections, 
whose size N is expressed in bytes. The symbol ’ indicates that the test was stopped after 63 
hours. 
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Table 3. Number of I/Os (bulk/total) of all experimented algorithms on the two text collections, 
whose size N is expressed in bytes. 
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Abstract. Let us consider an ordered set of elements A = {ai < ■ ■ ■ < a„} 
and a list of positive costs ci , . . . , c„, where d is the access cost of the element 
ai. Any search strategy for the set A can be represented by a binary search tree 
(BST) with n nodes, where each node corresponds to an element of A. The search 
strategy with minimum expected access cost is given by the BST that minimizes 
'^i-i Cin{ai) among all binary trees, where n{ai) denotes the number of nodes 
in the subtree rooted by the node that corresponds to the element ai. 

In this paper, we prove that the cost of an optimal search tree is bounded above by 
4C* ln(l+n), where C* = y{"_ ^ d . Furthermore, we show the this upper bound is 
asymptotically optimal. The proof of this upper bound is constmctive and generates 
a 4 ln(l + n) -approximate algorithm for constructing search trees. This algorithm 
runs in 0{nH) time and requires 0{n) space, where H is the height of the tree 
produced at the end of the algorithm. We also prove some combinatorial properties 
of the optimal search trees, and based on them, we propose two heuristics to 
constructing search trees. We report some experimental results that indicates a 
good performance of these heuristics. The algorithms devised in this paper can be 
useful for practical cases, since the best known exact algorithm for this problem 
runs in 0(n®) time, requiring O(n^) space. 



1 Introduction 

Let us consider an ordered set of elements A = joi < • • • < an}, a list of positive 
costs Cl , . . . , c„ and a list of probabilities pi,. . . ,Pn, where Ci and Pi are, respectively, 
the access cost and the access probability of element a{. Any search strategy for the 
set A can be represented by a binary search tree (BST) with n nodes, where each node 
corresponds to an element of set A. The search strategy with minimum expected access 
cost is given by the BST that minimizes 



( 1 ) 

i=l 

among all BST with n nodes, where Pi is the probability of the subtree rooted by the 
node corresponding to the element ai in the tree. In fact, Pi is given by the sum of the 
probabilities of all the elements in this subtree. 

The best exact algorithm for finding the optimal search strategy is based on dynamic 
programming and runs in O(n^), with O(n^) space requirement [1]. The case with 
uniform costs and different access probabilities has been extensively studied [2,3,4]. 
The best exact algorithm for this case is due to Knuth [2]. It runs in O(n^) time, with 
O(n^) space requirement. 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 236-247, 1999. 
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In this paper we consider the ease where the costs are different and all 

elements are equiprobable [5,6,1]- This case has applications in filter design [6] and 
on searehing in hierarchical memories [5], In [6], Knight proposed a simple dynamic 
programming algorithm to build the optimal search strategy. This algorithm runs in 
O(n^) time, requiring O(n^) space. Currently, this is the best exact algorithm for this 
problem. Knight also analyzed the expected cost of the search for the eost structure 
Ci = where A: is a fixed constant. This cost structure arises in filter design problems. 

Nevertheless, the time and spaee requirements of the exact algorithm, makes it prohi- 
bitive for big values of n. Motivated by this fact, we eonsider alternatives for constructing 
good search trees. In this paper, we obtain the following results. We present a necessary 
condition for a BST be an optimal search tree. Based on this condition, we give a non- 
trivial upper bound for the height of an optimal search tree. In fact, we prove that the 
height of an optimal search tree is 0{^/n) for practical cases. We present the algorithm 
Ratio, a 41n(l + n) -approximate algorithm that runs in 0{nH) time, with 0(n) space 
requirement, where H is the height of the tree obtained at the end of the algorithm. 
The analysis of this algorithm shows that AC\n{l + n) is an upper bound for the cost 
of an optimal search tree, where C = Cj- We also prove that this upper bound 

is asymptotically optimal. Finally, we propose two practical heuristics for constructing 
search trees and we report some experiments comparing their results with the results 
obtained by other algorithms proposed in the literature. These experiments indicate a 
good performance of the heuristics devised in this paper. 

Since all the elements are equiprobable, (1) can be rewritten as 

1 " 

- ^Cin(ai) 

2 = 1 

where n{ai) denotes the number of nodes in the subtree rooted by the node that 
corresponds to the element a^. For convenience, we omit the 1/n factor. 

Throughout this paper, as an abuse of notation, we also use ai to denote the node 
in a BST that corresponds to the element a^, for i = 1, . . . , n. In addition, given an 
ordered set oi < • • • < and a corresponding BST T, we observe that the ordered 
set Oi < • • • < ttj that corresponds to a given subtree T' of T is always a contiguous 
subset of oi < • • • < Un- Hence, this subset can be represented by the interval of indexes 
[i, . . . , j]. This representation is extensively used throughout this paper. Finally, we use 
C to denote XlILi 

This paper is divided as follows. In section 2, we show a necessary condition for a 
binary tree be an optimal search tree and we give an upper bound for the height of an 
optimal search tree. In section 3, we present the algorithm Ratio and an an upper bound 
for the cost of an optimal search tree. In section 4, we propose two heuristics based in 
the properties proved throughout the paper and we report some experiments involving 
these heuristics and other algorithms proposed in the literature. 
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2 Combinatorial Properties 



In this section, we show some combinatorial properties of optimal search trees. We show 
a necessary condition for a given node be the root of an optimal search tree and we give 
a necessary condition for the height of an optimal tree be equal to H. 



2.1 Optimality Condition 

Theorem 1. If the node ak is the root of the optimal search tree T for the nodes in the 
interval [1, . . . , m] then: 

(i) For all j < k we have c,- > ^ 

(ii) For all I > k we have ci > ^ • 



Proof. The idea of the proof is showing that if these conditions do not hold, then the 
cost of the tree T can be improved. We only give a proof that condition (i) holds, since 
the proof for condition (ii) is entirely analogous. 

Let us assume that at is the root of the optimal search tree T for the interval 
[1, . . . , m]. Now, let j < k and let T* be the tree obtained after the application of 
the procedure below. 

While ttj is not the root of the T, obtain a new tree T through a rotation involving 
ttj and its parent 

Figure 1 .(a) shows a tree T and Figure 1 .(b) shows the tree T* obtained after applying 
the transformation proposed. 

The transformation proposed implies in two facts that we mention without proving. 

(a) If Oj! is not an ancestor of node Oj in T, then its number of descendants does not 
modify from T to T*. 

(b) If Oj' is an ancestor of node Uj in T and y ' f j, then its number of descendants 
in T is greater than or equal its number of descendants in T* . 

Let n*{ah) be the number of descendants of node oh in the tree T* . We have that 
n* {af = m and n{ak) — n* {ak) = j. Furthermore, it follows from (a) and (b) that 
n*' (tth) < n(ah) for h f j. Hence, we have that 



c(T*) - c(T) <{m- n{aj))cj - j.Ck- 



Hence, we must have c,- > ^ ^ 

a TYl It y CL J j 

contradicts the optimality of T. 



, otherwise we would have c{T* ) < c(T), which 

□ 



We point out that this result can be easily generalized for the case with different 
access probabilities. Now, we present a corollary of the previous theorem that will be 
used in the designs of two heuristics in section 4. 
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Fig. 1. Figure (a) shows a tree T with root Uk and figures (b) shows the tree T* obtained applying 
the transformation. 



Corollary 2. If the node is the root of the optimal search tree for the interval 
[i, ... ,m] then: 



(i) For all j < k we have Cj > 

(ii) For all I > k we have ci > 

Proof. This result follows immediately from the previous theorem since n(oj) > 1 and 



2.2 Bounding the Height of an Optimal Search Tree 

Now, we apply Theorem 1 to give a necessary condition for the height of an optimal 
search tree be equal to H. 

Theorem 3. If the height of the optimal search tree for the interval [1 , . . . , n] w H, then 
there are two elements at and Uj such that 



n{ai) > 1 



□ 




Proof. See appendix A 



□ 



By using Stirling approximation for [iT/2] !, we can obtain the following corollary. 



240 



E. Sany Laber, R. Luiz Milidiu, and A. Alves Pessoa 



Corollary 4. If the height of the optimal search tree for the interval [1, . . . , n] is H, 
then, for large n, there are two elements Ui and Oj, such that 




\4ne ) 



A consequence of this last result is that the height of an optimal search tree for 
practical cases must be 0{^/n), otherwise there must be costs differing by enormous 
factor as f^, where / is o;(l). 



3 An Approximate Algorithm 

In [6], Knight proved upper and lower bounds for the cost of optimal trees for a special 
structure of costs. He considered the structure for f = 1, . . . , n, where A: is a 

fixed constant. In this section, we prove that the cost of an optimal search tree is bounded 
above by 4C ln(l + n). It must be observed that we do not assume anything concerning 
the costs. 

The proof of this result is based on the analysis of the algorithm Ratio presented in 
figure 2. This algorithm uses a top-down approach combined with a simple rule to select 
the root of the search tree for the current interval of nodes. In the pseudo-code ak-left 
and ttk -right are, respectively, the left and the right children of the node Uk- If a node 
Ofe does not have a left(right) children, then null is assigned to ak-left{right). 



Algorithm Ratio ; 

root ^ Root( 1 ,n) 

Function Root (i,m): integer ; 

If I < m then 

1. Find the node Uk that minimizes Cfc/(min{fc — i + l,m — k + 1}) on the interval 

[i, ..,m] 

2. Uk-left ^ Root(i, fc — 1) ; Uk-right ^ Root(fc + 1, m). 

3. Return k ; 

Else 

Return null; 



Fig. 2. The Ratio Algorithm. 



3.1 Cost Analysis 

In this section we analyze the cost of the tree produeed by algorithm Ratio. In order to 
bound this cost, we need the following proposition. 

Proposition 5. If the node is selected by the algorithm Ratio to be the root of the 
nodes interval [1, .., n], then Ck < nfn+ 2 ) ■ 
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Proof. We assume without loss of generality that k < [n/2], since the other case is 
symmetric. Since k is selected, then we have the following inequalities 

^ Cfe X i + l\ „ . 

Ci > ; tor z = 1 , . . . , n 

k 

By summing the inequalities above in i, we obtain that 

n [’^721 . n , • , 

E Cfe X t Cfc X (n — t + 1) Cfc(n + 2)n 

“^2^—+ k ^ 4t 



Since "^^=1 Cj = C, then we obtain that 



Cfe < 



n(n + 2) 



□ 



Now, we prove that the cost of the tree constructed by algorithm Ratio is bounded 
above by 4C'ln(l + n). 

Theorem 6. The cost of the tree T constructed by the algorithm Ratio for the interval 
of nodes [1, n], with c* = C, is bounded above by AC ln(l + n). 

Proof. If n = 1, the result obviously holds. Now, we assume that the result holds for 
k < n and we prove that the result holds for k = n. 

Let us assume that k is the index selected by the algorithm Ratio to be the root 
of the interval of nodes [1, .., n]. Moreover, we assume without loss of generality that 
k < r^^/2]. Hence, the cost c(T) of the tree T constructed by the algorithm Ratio is 
given by 

c(T) = nck + c(T(l, A: - 1)) + c{T{k + 1, n)), 

where c(T(l, A: — 1)) and c{T{k + 1, n)) are, respectively, the costs of the trees con- 
structed by the algorithm Ratio for the intervals of nodes [1, .., A: — 1] and [A: + 1, .., n]. 
If Cl = Ci and C 2 = Er=fc+i Ci. then it follows from induction that 

c(T) < nck + ACi ln(A:) + 4 C 2 ln(n — A: + 1), 

Since ln(A:) < ln(n — A: + 1) for 1 < A: < [n/2] and Ci + C 2 < C, then we obtain 

that 



c(T) < nck + 4(Ci + C 2 ) ln(n — A: + 1) < nck + 4Cln(n — A: + 1), 



On the other hand, proposition 5 assures that Ck < (4A:C)/n(n + 2). Hence, 
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c(T) < + AC\n{n -k + l). 

n + 2 

Nevertheless, we can show hy using some calculus techniques that f{k) = + 

4C'ln(n — A: + 1) reaches the maximum in the interval [0, [n/2]] when k = 0. Hence, 
we have that 

4kC 

c{T) < — + 4C'ln(n — /c + 1) < 4C'ln(l + n), 

what establish the theorem. □ 

Now, we give an immediate corollary of the previous result. 

Corollary 7. The cost of an optimal search tree for the elements oi < • • • < a„, with 
corresponding costs ci, . . . ,Cn is bounded above by 4C ln( 1 + n), where C = XlILi 
Furthermore, this bound is asymptotically optimal up to constant factors. 

Proof. The upper hound for the cost of an optimal search tree follows immediately from 
the previous theorem. The asymptotical optimality follows from the fact that the cost of 
an optimal search tree for the list of costs Ci = • • • = c„ is given hy C log n. □ 

The expected cost of the tree produced by Ratio algorithm is proved to be no greater 
than to 0((C'/n) ln(l + n)). This is the best that we can have if all the costs are equal. It 
would be interesting to determine if there is a constant k such that c{Tr)/ c{T* )< k for 
any list of costs, where c(Tr) and c(T*) are, respectively, the cost of the tree produced 
by Ratio Algorithm and the cost of an optimal tree. Until now, we do not know how to 
respond this question. 

The algorithm Ratio runs in 0{nH ) time, where H is the height of the tree obtained 
at the end of the algorithm. In order to have a better idea of the running time, the value 
of H must be bounded in terms of the input costs as we did for the optimal search tree. 

4 Search Heuristics and Experimental Results 

In this section, we present some heuristics to construct search strategies and we report 
some experimental results obtained by comparing the performance of these heuristics 
with the optimal search strategy. 

We introduce two heuristics that we call Candidate Heuristics. First, we define the 
concept of a candidate node. 

Definition 8. If the node ak respects the conditions of Corollary 2 for a given interval 
[i, ... ,m], then we say that this node is a candidate node. 

For example, let us consider the nodes 03 , . . . , ag, with respective costs C3 = 3, C4 = 
6,cs = 20, cq = 30, C 7 = 5,cg = 8. In this case, the candidates nodes are the nodes 
03, 04, 07. As an example, the node 05 cannot be a candidate node since C3 < 20/(6 — 
1) = 4. 
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Now, we are ready to explain the idea of candidate heuristics. We say that a heuristic 
^ is a candidate heuristic if it satisfies the two items below. 

(i) A uses a top down approach, that is, at each step A selects the root of the current 
interval through some criterion and, after that, it recursively considers the left and the 
right intervals. 

(ii) A always selects a candidate node to be the root of the current interval. 

We must observe that the difference between two candidate heuristics is the criterion 
used to select the root node among the candidate nodes, for the current interval. We point 
out that the algorithm Ratio presented in the previous section is a candidate heuristic, 
since we can prove that the node selected by the algorithm to be the root of the interval 
[i, ...,m] is always a candidate node. Next, we list two possible criteria for the selection 
of the root, each of them defining a different candidate heuristic. 

1 . Select the candidate node that minimize the absolute difference between the sum of 
the costs of the nodes on its left side and the sum of the costs of the nodes on its 
right side. 

2. Select the candidate node with index closest to the median of the current interval. 

Let us consider the example presented in this section. If we choose criterion 1, then 
the heuristic selects the node 07, since it minimizes the absolute difference between the 
sum of its right and left nodes. If we choose criterion 2, then the heuristic selects either 
the node 04 or the node 07. Figure 3 presents a pseudo-code for a generic candidate 
heuristic. 



Heuristic Candidate ; 

root ^ Root( 1 ,n) 

Function Root (i,m): integer ; 

If i < m then 

1. Find the set of candidates for the interval [i, . . . ,m] 

2. Among the set of candidates nodes, choose the node ak that satisfies the desired 

criterion. 

3. afe.left ^ Root(L k — 1); afe.right ^ Root(fc + 1, m); Return k 

Else 

Return null; 



Fig. 3. The Heuristic Candidate. 



Now, we show how to find the set of candidates for a given interval [i, ... ,m] in 
0(m — i) The pseudo-code is presented in Figure 4. First, the procedure scans the 
vector from left to right checking for each new index k whether the node Uk satisfies the 
condition (i) of Corollary 2 or not. This check can be done in 0(1) by comparing Ok 
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CandidateSet(i,m) ; 



For j=i to m do candidate [j]=true 

f ^ i 



For j = i + 1 to m do 

If then candidate[j]=false 

If . then j* = j 

J-i+i 3* -i+i ■' •' 

End For 

j* ^ m 

For j = m — 1 downto i do 

If 
If 

End Eor 



m—j* + 
m—j + l 



1 < candidate[j]=false 



< 






J then j = j 



Fig. 4. Finding the set of candidates. 



with ttj » , where Uj » is the node that minimizes aj/{j — i + l) in the interval [i , . . , fc — 1] . 
Next, the procedure scans the vector from right to left testing if at satisfies the eondition 
(ii) of Corollary 2 or not. This test is analogous to that performed in step the first sean. 
At the end of the proeedure the nodes that satisfies eonditions (i) and (ii) of Corollary 2 
are the candidates nodes. 



4.1 Complexity of Candidate Heuristics 

The time complexity of a candidate heuristic depends on the criterion uses to decide 
which candidate node will be the root of the current interval. The two criteria proposed 
in the previous section can be implemented to run in linear time. Since the set of candidate 
nodes can be found in 0{m — i), then the two candidate heuristics run in 0{nH), where 
H is the height of the tree produced by each heuristic. Since H isO(n),thentheheuristics 
are 0{n^). 

4.2 Experimental Results 

In this section, we report some experiments performed to evaluate the quality of the 
heuristics proposed in this paper. We consider six different algorithms: the optimal 
algorithm (Opt) [6], the algorithm Ratio, the candidate heuristic with criterion 1 (Candl), 
the candidate heuristic with criterion 2 (Cand2), a greedy algorithm that always choose 
the node with minimum cost (Small) [1], and an ordinary binary search (BinS). 

We compare the behavior of these algorithms for two kind of costs structures. The 
first one, is the structure that arises in a filter design problem [6]. The cost Ci is given by 
where /c is a fixed constant. The second one is a random structure of costs. The costs 
are generated by choosing Ci, randomly, in the interval [1, .., c*], where c* is chosen as 
a function of n. 

Table 1 presents the results obtained by the algorithms for the structure of costs 
Ci = for k = 0, 1,2,3 and n = 50,200,500. For equal costs {k = 0), all the 
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heuristics obtain an optimal search strategy. This fact is easy to explain analytically. 
For k >1, we observe that the algorithms Ratio, Cand2 and BinS achieved very good 
results. The relative error of these algorithms, comparing to the optimal one, was smaller 
than 4% for all the cases. The algorithm Candl also obtained good results. Its relative 
error oscillated from 5% to 12%. The algorithm Small obtained very poor results, as 
we could predict analytically. In fact, the algorithm Small was designed for a random 
structure of costs and for the case where the access cost depends on the previous access 
[ 1 ]. 



Table 1. Results for the cost structure d = for n = 50, 200, 500 and k = 1, 2, 3. 



k 


0 


1 


2 


3 1 


n 


50 


200 


500 


50 


200 


500 


50 


200 


500 


50 


200 


Opt 


4.86 


6.76 


8.00 


4.75 


6.66 


7.98 


4.39 


6.27 


7.56 


4.09 


5.94 


Ratio 


4.86 


6.76 


8.00 


4.85 


6.76 


8.00 


4.47 


6.35 


7.60 


4.20 


5.99 


Candl 


4.86 


6.76 


8.00 


5.34 


7.29 


8.60 


4.82 


6.58 


7.95 


4.47 


6.41 


Cand2 


4.86 


6.76 


8.00 


4.83 


6.76 


7.99 


4.52 


6.39 


7.65 


4.19 


6.05 


Small 


4.86 


6.76 


8.00 


17.33 


67.33 


167.33 


13.13 


50.63 


125.63 


10.61 


40.60 


BinS 


4.86 


6.76 


8.00 


4.83 


6.76 


7.99 


4.54 


6.44 


7.66 


4.24 


6.11 



Table 2 presents the results obtained by the algorithms for the random cost structure. 
We choose the value of the maximum cost c* as a function of n. For each n, we consider 
3 values for c* , c* = n/10,n, lOn. In order to obtain more stable results, we run 
the experiments hundred times for each pair (n, c * ) . After, we evaluated the average 
cost for each experiment. Looking at 2, we observe that the algorithm Ratio obtained 
excellent results in all experiments. Its relative error was smaller than 1% in all cases. The 
algorithms Candl, Cand2 and Small obtained good results for all cases. The maximum 
relative error of these algorithms was about 15%. Finally, the ordinary Binary Search 
obtained poor results. 



Table 2. Results for the random eost structure 





n/10 


n 


lOn 


n 


50 


200 


500 


50 


200 


500 


50 


200 


500 


Opt 


3.06 


2.88 


2.80 


2.53 


2.65 


2.68 


2.50 


2.64 


2.67 


Ratio 


3.08 


2.89 


2.82 


2.54 


2.66 


2.70 


2.51 


2.65 


2.68 


Candl 


3.46 


3.19 


3.08 


2.72 


2.87 


2.92 


2.68 


2.85 


2.90 


Cand2 


3.50 


3.30 


3.21 


2.79 


2.99 


2.99 


2.76 


2.95 


2.99 


Small 


3.17 


3.08 


3.05 


2.76 


2.92 


2.97 


2.75 


2.92 


2.96 


BinS 


4.86 


6.73 


8.09 


4.95 


6.86 


7.87 


4.81 


6.75 


7.98 
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Comparing all the experiments, we realize that the Ratio algorithm got a very impres- 
sive result, with relative error smaller than 3% in all cases. Nevertheless, we must observe 
that we just consider two cost structures and we do not have strong approximation results 
for any of the proposed algorithms. 

Acknowledgement. We would like to thank Gonzalo Navarro that introduced us this 
problem. We also thank the referees for their helpful comments, in particular the one 
that realized a mistake in our implementation. 
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A Proof of theorem 3 



Proof. Let us consider an optimal search tree T with height H and let P be a path from 
some leaf to the root of T that contains H + 1 nodes. Then, there is a sequence of nodes 
of P , 0 ^ 2 , . . . , satisfying the two conditions below. 

(i) Oi- is descendant of for j = 1, . . . , TP/2] — 1. 

(ii) Either ii < h ■ . . < i\H/ 2 \ or > h ■ ■ ■ > i\H/ 2 \- 

We assume without loss of generality that ii < 12 ■ ■ ■ < i\H/2\- 
Now, let at,, be the descendant of the subtree rooted at with the smallest index. 
Hence, is the root of the interval [fj, .., — 1]. Observe that +1 < n + 1. 

Furthermore, at- is the root of the interval [ij-i, .., ij+i — 1]. Since T is an optimal 
search tree and at is descendant of ai-.,, it follows from Theorem 1 that 

Lj •'3 + 1 ’ 



^ ^ •'ij + lih 

“ {ij+2 - ij+i) + [ij-i - (j) 



forj = 1 ,..., l‘iT/2] - 1 



By multiplying the set of inequalities above, we obtain that 



Cii 



"Hh/2-I 



£ n 

i=i 



H 



— + 1 



{'ij+2 ij+i) + (-^j — 1 ^j) 



( 2 ) 



Hence, in order to give a lower bound for / Ci^H/ 2 '] , we must minimize the right 

side of (2) constrained to 1 < < • • • < H < • • • < pH/ 2 ]+i < n + 1. 

First, we have that 



\H/ 2\-1 



i=i 



(3) 



On the hand, since 



[H/2]-l 

(ij+2 ~ ij+i) + ~ ^j) ^ 

i=i 



n, 



it follows from means inequality that. 

\H/2-]-l 



/ \ 

{ij+2 ~ ij+i) + ~ ^j) < /2] — 1 / 



\ \H/ 2\-1 



(4) 



By combining (3) and (4), we obtain that 



\H/2-\-l 



n 



ij ^j 4' 1 



0|-ff/2i (L+2 *i+i) + (-^j-i (.j) 



>{\H/2\-l)\ 



[P/2] _i \ r^/21-i 



n 



□ 
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Abstract. This paper addresses the informational asymmetry for constructing an 
ultrametric evolutionary tree from upper and lower hounds on pairwise distances 
between n given species. We show that the tallest ultrametric tree exists and can 
be constructed in 0{r?) time, while the existence of the shortest ultrametric tree 
depends on whether the lower bounds are ultrametric. The tallest tree construction 
algorithm gives a very simple solution to the construction of an ultrametric tree. 
We also provide an efficient 0(n^)-time algorithm for checking the uniqueness of 
an ultrametric tree, and study a query problem for testing whether an ultrametric 
tree satisfies both upper and lower bounds. 



1 Introduction 

Constructing the evolutionary tree of a speeies set is a fundamental problem in compu- 
tational biology. Such trees describe how species are related to one another in terms of 
common ancestors. A useful computational problem for constmeting evolutionary tree 
is that given an n x n distance matrix M where is the observed distance between 
two species i and j, find an edge-weighted evolutionary tree in which the distance dij in 
the tree between the leaves i and j, equals Mij. Pairwise distance measures carry some 
uneertainty in practice. Thus, one seeks a tree that is close to the distance matrix, as 
measured by various ehoices of optimization objectives [2, 3, 6, 8]. 

This paper focuses on the class of ultrametric trees [5, 7, 8, 9]. An ultrametric tree is 
a rooted tree whose edges are weighted by a non-negative number such that the lengths 
of all the root-to-leaf paths, measured by summing the weights of the edges, are equal. A 
distance matrix is ultrametric if an ultrametric tree can be constructed from this matrix. 
Figure 1 shows an example of an ultrametrie matrix and an ultrametric tree constructed 
from this matrix. 

In practice, when distance measures are uncertain, a distance is expressed as an 
interval, defined by a lower bound and an upper bound for the true distance. From 
such data, we obtain two distance matriees and M^, representing pairwise distance 

* Supported in part by the Lipper Foundation. 

** Supported in part by NSF Grant 9531028. 

J. Nesetfil (Ed.): ESA’99, LNCS 1643, pp. 248-256, 1999. 

(c) springer- Verlag Berlin Heidelberg 1999 
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c 


d 


a 


0 


2 


4 


6 
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2 


0 


4 


6 


c 


4 


4 


0 


6 


d 


6 


6 


6 


0 



root 




Fig. 1. a. An ultrametric matrix M. b. An ultrametric tree for M. 



lower and upper bounds. The tree construction problem becomes that of constructing 
an ultrametric tree whose pairwise distances fit between two bounds, i.e., — — 

. Farah, Kannan and Wamow [4] gave the first known algorithm for constructing 
ultrametric trees from the sandwich distances. 

This paper studies the informational asymmetry between lower and upper bounds 
in the construction of ultrametric trees. Our results are as follows: 

- Given an upper bound matrix, the tallest ultrametric tree, where the distance of 
any two leaves reaches the maximum among all satisfied ultrametric trees, can be 
constructed in 0{n^) time. This result immediately leads to a new and simpler tree 
construction algorithm than that of [4] . 

- Given a lower bound matrix, the shortest ultrametric tree, defined similarly to the 
tallest ultrametric tree, exists if and only if the matrix is ultrametric. 

- We provide an 0(n^)-time algorithm to check the uniqueness of an ultrametric tree 
satisfying given upper and lower bounds. 

- We study a query problem: if a lower bound matrix and an upper bound matrix 
are given, how fast can we determine whether an ultrametric tree satisfies both con- 
straints? This problem is useful, for example, for developing an interactive software 
for finding the most suitable tree among many. We present an algorithm to test the 
satisfaction of the upper bound constraints in 0{n) time after preprocessing the up- 
per bound matrix. A similarly fast algorithm for testing the lower bounds remains 
open. 



2 Notation 

Let M represent a distance matrix, and and represent the lower bound and upper 
bound matrices for M. Let dij represent the distance between leaf i and leaf j in a tree, 

defined as the length of the path connecting two leaves. Given a tree U, d’/, represents 

^.1 

that distance in U . If U is ultrametric, the height of U is denoted by function UiU), or 
l%{r) if r is the root of U . 
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An nx n distance matrix M corresponds to an undirected edge-weighted complete 
graph G with n vertices, where a vertex represents a species and the weight w{i,j) of 
edge (i, 7 ) is Mij . G is also said to be the corresponding graph to M. For the lower bound 
matrix and the upper bound matrix M^, their corresponding graphs are denoted by 
and G*, and the weight functions are v/Q and w^{). 

3 Constructing the tallest ultrametric tree 

This section discusses the problem of finding the tallest ultrametric tree for any up- 
per bound matrix. We give a simple 0(n^)-time algorithm that takes advantages of the 
minimum weight spanning tree. 

Fact 1 (see [1, 4, 8]) A matrix is ultrametric if and only if in the corresponding com- 
plete weighted graph, the largest weight edge in any cycle of more than one node is not 
unique. 

Proof. Straightforward. □ 

Suppose we have an upper bound matrix on the pairwise leaf-leaf distances of 
an ultrametric tree. There are many ultrametric trees satisfying M*, but which one is the 
tallest? The following algorithm gives the answer. Let G* be the corresponding graph 
ofM^ 

Algorithm Compute_Tallest_Tree 

1. Construct the minimum weight spanning tree T of G*. 

2. Sort the edges of T in decreasing order as ci,C 2 ,...,en-i. 

3. Return the tree U constructed by the following procedure; 

(a) If T is empty, return a leaf with zero height. 

(b) Otherwise, remove the first edge ei from T, leaving two subtrees, Ti and ? 2 , 
each of which maintains its edges in decreasing order. 

(c) Recursively construct a tree Ui from 7i, and a tree U 2 from T 2 . 

(d) Construct a root r for U with height H{U) = jw* (e 1 ) , and attach Uito r with an 
edge weighted jw^{ei) — %(U\) and U 2 to r with an edge weighted jw^{ei) — 
HU 2 ). 

(e) Return U. 

This algorithm constructs an ultrametric tree in 0{n^) time: 

Lemma 2. Algorithm Compute -Tallest -Tree runs 0{r?) time and returns an ultramet- 
ric satisfying the given upper bound matrix. 

Proof. The algorithm takes 0{n^) time to build the minimum weight spanning tree T, 
and O(nlogn) time to sort the edges. Maintaining the edges in the subtrees of T in 
decreasing order (at Step 3b) takes 0{n(x{n,n)) time by using the disjoint-set forest 
data structure, where a() is the inverse of Ackermann’s function, and a{n,n) < 4 in 
any conceivable application. Here we implicitly use the fact that any connected subtree 
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of a minimum spanning tree is still a minimum spanning tree. Therefore, the total time 
complexity is 0 {n^). At Step 3d, if both Ui and U2 are ultrametric, U is ultrametric 
because the distance from root r to any leaf is Step 3a returns a tree that has 

only one leaf and is ultrametric, so by recursion U is ultrametric. U also satisfies the 
upper bound matrix: for any two leaves x of U\ and y of U2, the weight of the edge 
w* (x, y) in graph is at least v/'{ei), which equals ; otherwise, we could replace e 1 

by (x,y) in T. □ 

Theorem 3 . The tree U constructed by Algorithm Compute Jhllest -Tree is the tallest 
ultrametric tree. 



Proof. Let x and y be any two leaves of U , and c be their lowest common ancestor. Let 
{a,b) be the edge in the minimum spanning tree T whose removal generates c at Step 
3d of Algorithm Compute_Tallest_Tree. Then, 




Ultrametric Tree U Minimum Spanning Tree T 



Fig. 2. Constructing an ultrametric tree using a minimum spanning tree 

Let Ui and U 2 be two immediate subtrees of c, which are constructed from two 
subtrees T\ and T2 of T, respectively. Assume that leaf a and leaf x are under t/i, and leaf 
b and leaf y are under U 2 , as shown in Figure 2. Note that 7i is a connected component 
and thus contains a path from x to a, and similarly T 2 contains a path from b to y. 
Combining them with the edge (a, b), we obtain a path P from x to y, i.e., 

p = x-vi - ...-Vi-a-b- Vi+i - ... - V; - y. 

Note that (a, b) is selected in the algorithm because it has a larger weight than any other 
edge in T\ U T2. Thus, 

max 

(w,z)eP 
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Let V be any other satisfied ultrametric tree. By Fact 1, should not be the sole largest 

distance in the cycle P U {x,y). Thus, 

d'iy < max dl^ < max 

^ [w,z)eP [w,z)(iP 

Combining all the inequalities, < d^ and U is the tallest ultrametric tree. □ 

Theorem 3 immediately leads to the following theorem: 

Theorem 4. Algorithm Compute Jallest -Tree constructs an ultrametric tree satisfying 
both lower bound and upper bound constraints, if one exists. 

Proof. Algorithm Compute _Tallest_Tree constructs the tallest ultrametric tree U for up- 
per hounds. If lower hounds are also given, either U satisfies the lower hounds, or no 
tree satisfies them. □ 

4 Asymmetry between upper and lower bounds 

We have shown how to construct the tallest ultrametric tree in Section 3. However, the 
shortest ultrametric tree may not exist. The following lemma explains the asymmetry 
between upper and lower bounds. 

Lemma 5. There exists a lower bound matrix L such that for any ultrametric tree V 
that satisfies L, there is an upper bound matrix H which V cannot satisfy but some 
ultrametric tree U satisfies both L and H. 

Proof. Consider a lower bound matrix L having three elements a, b and c, whose dis- 
tances are x, y and z, satisfying x > y and x > z. Let dm be the maximum distance in 
L. We construct two upper hound matrices Hi and Hz, where every distance equals dm 
except the three distances shown below. 




Every element in H\ is at least as large as the corresponding element in L, and H\ is 
ultrametric by Fact 1. Thus, there are ultrametric trees satisfying L and Hi. Let V be 
such a tree. So and d^^ are fixed: d^j^ = x and d^^ = z. By Fact d^^=^ x, which is 
larger than the corresponding element y in Hz. V does not satisfy Hz. Similarly, Hz is 
ultrametric and any ultrametric tree satisfying L and Hz does not satisfy Hi . Therefore, 
no ultrametric tree satisfies both Hi and Hz at the same time. □ 

By Lemma 5, a lower hound matrix does not have a shortest ultrametric tree if there 
exists a three-element cycle whose largest value is unique. On the other hand, if the 
largest value of any of its three-element cycle is non-unique, then. 

Lemma 6. If any three element cycle in a matrix has a non-unique largest value, so 
does any cycle in the matrix. 
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Proof. Let M be a matrix where the largest value in any three-element cycle is not 
unique. Assume there exists a cycle C = (vi,V2,..., v*:,vi), in which k> 2> and 
is the unique largest value. We decompose C into three-element cycles: (vi, V2,V3,vi), 
(vi, V3, V4, vi), ..., and (vi,v*:_i, v^:,vi). Since the largest value in any of them is not 
unique, the first cycle has Myjyj = iffvivs because My,v 2 > iffv 2 V 3 by the assumption. 
Similarly, Myjv,- = -^viv,+i in the ith cycle, for i = 2, ..., k — 1. Combining these results, 
MyjV 2 = ^vivf which contradicts our assumption. Thus the largest value in any cycle of 
M is not unique. □ 

Theorem 7. A lower bound matrix has a shortest ultrametric tree if and only if it is 
ultrametric. 

Proof. If a lower bound matrix is ultrametric, by definition, an ultrametric tree can 
be constructed from this matrix. This tree is the shortest. If a lower bound matrix has a 
shortest ultrametric tree, by the proof in Lemma 5, the largest value in any three-element 
cycle is not unique. So does any cycle in the matrix, by Lemma 6. Finally, by Fact 1 
this matrix is ultrametric. □ 

5 Uniqueness of ultrametric tree 

Given and , is the ultrametric tree built by Algorithm Compute _Tallest_Tree topo- 
logically unique? Knowing the uniqueness is of vital importance because it indicates the 
significance of the tree. 

We first define what kind of trees have the same topological structure. An edge- 
weighted tree is compact if it has no zero-weight edge. A tree can be compacted by 
merging any two internal nodes into a single node if they are connected by a zero 
weight edge. A compact ultrametric tree is one that is compact. Any ultrametric tree 
can be converted into a compact ultrametric tree by merging, but the resulting tree may 
not be binary. Assume all the trees have the same set of labeled leaves. An internal node 
can be represented by a leaf set, consisting of all the descendent leaves of the node. A 
tree can be represented as the superset of leaf sets, where every element in the superset 
corresponds to an internal node, and vice versa. Two trees are equivalent or have the 
same topological structure if their representing supersets are the same, in other words, if 
there is a one-to-one mapping between all nodes that preserve the parent-child relation. 
Two compact ultrametric trees may have the same topological structure even though 
their edge weights are completely different. In discussing topological structures, the 
edge weights are ignored, because equivalent evolutionary trees give the same evolu- 
tionary process, and the difference in distances are usually caused by measuring errors. 

Given and M^, we construct an ultrametric tree from by Algorithm Com- 
pute _Tallest_Tree, and then convert it into a compact tree U. We can check the unique- 
ness of U by the following lemma: 

Lemma 8. For given and M^, the compact ultrametric tree U, constructed from 
Algorithm Compute .Tallest .Tree, is unique if and only if for every internal node u, any 
two children ui and uj of u satisfy that maxM^ = l%{u), for any leafx under Ui and any 
leafy under uj. 
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Proof. Suppose that and define a unique compact ultrametric tree. Since U is 
the tallest compact ultrametric tree for M^, it satisfies and is unique. Assume the 
lemma does not hold, then there exist two different children m, and Uj of u, such that 
maxM^ < h{u), for any two leaves x under m , and y under Uj. Let du^Uj = maxM^ and 
dm = Txiax{duiUj,1i{ui),h{uj)). We can construct a different tree from U hy deleting two 
children m, and uj from u, and replacing hy a child u' who has two children m, and uj, and 
l%{u') = j(fi(M)+fifm)-This tree isacompact ultrametric tree because l%{u) > l%{u') > dm, 
and it is topologically different from U, contradicting the uniqueness of U . 

Conversely, if any two children m , and Uj of u satisfy that maxM^ = h(u) for any 
leaf X under m , and any leaf y under uj, we state that U is unique. Assume there exists 
another topologically different tree V. Let d^ be the minimum value among those who 
satisfy dfy < d^ (U is the tallest), d^ must exist, otherwise V equals U because all 
pairwise distances of V are equal to those of U. Let u be the least common ancestor of 
X and y in U, Ux be the child of u that contains x as a descendant, and Uy be the child of 
u that contains y as a descendant. Let Sx be the set of leaves under Ux and Sy be the set 
of leaves under Uy. 

For any w e Sx, d^ < d^ = (ux) < d^, and for any z G Sy, d^^ <d^^=h^ (uy) < 
d^. By Lemma 6 and Fact I, dx^< max{d^,dyy^) < d^. Thus, in V, the distance between 
X and any other leaf in Sx'JSy is less than d^. So is the distance of any pair of leaves 
in Sx U Sy. However, it contradicts the condition in the lemma that there exist a pair of 
leaves w <E Sx and z <E Sy with — hF [u) = d^. Hence, V can not exist, and U is 
unique. □ 



Theorem 9. Given and as input, we can determine the uniqueness of ultramet- 
ric trees in 0{n^) time. 

Proof. We can construct a compact ultrametric tree U in OfF) time by Algorithm 
Compute_Tallest_Tree, and then check the conditions of every internal node in a child- 
parent topological order in 0{n^) time by Lemma 8, because every pair of leaves is 
visited exactly once. □ 

6 Query of ultrametric trees 

Assuming that we have obtained new ways to estimate the evolutionary distances be- 
tween species (by some new evidence), and there are many previously computed ultra- 
metric trees (by other evidences), finding the suitable trees among them may suggest 
new relations between these evidences. This section studies this query problem: given 
M^' and M^, and an ultrametric tree V, does V satisfy and M*? A naive algorithm 
runs in 0{n^ ) time by checking 0{n^) pairs of leaves in V and calculating the distance 
of eaeh pair in 0{n) time. We can improve 0{n^) to 0{rF) by pre-calculating the lowest 
common ancestor in linear time, and thus finding the distance of each pair of leaves in 
constant time. If preprocessing is permitted, is there a faster algorithm than 0{n^)l 

Lemma 10. ~We can preprocess the upper bound matrix in 0{n^) time, so that for 
any given ultrametric tree V, whether V satisfies can be determined in 0{n) time. 
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Proof. Assume we have built the tallest ultrametric tree U by Algorithm Compute_Tall- 
est_Tree, and have calculated the lowest common ancestor (lea) function for V in linear 
time. We dehne a recursive function / to map a node in t/ to a node in V : 

, . _ J V if M is a leaf in t/, v is a leaf in V, m = v 

1 lca{f{ui),f{ur)) if u has children m/ and Ur 

For each internal node u, f returns an internal node v under which its leaves form a 
superset of the leaves under u. By lea function, v is the lowest node whose leaves form 
the minimum superset. 

We sort all the leaves and internal nodes of U into a topological order ui,U 2 ,..., 
where a node appears before its parent, in 0{n) time by a depth-first search. For each 
Ui, if {ui), V satisfies M^. We next show this in two steps. First, because 

Ml, M2,... follow the topological order (child to parent), we can calculate vi =/(mi),V2 = 
/(m 2 ),... in this order in 0{n) time. Second, if u has two children u\ and M 2 and v = 

f{u), (v) < (u) indicates that for any pair of leaves w under mi and z under M 2 , 

< 2h^ [u) = Visiting every node in U and checking the 

corresponding inequality is equivalent to comparing the distance of every pair of leaves 
in U with that in V. If all of them hold, V satisfies M^. Otherwise, if (v) > (u), 

there must exist a leaf w under mi and z under M 2 such that violating the 

assumption that U is the tallest (by construction) in Theorem 3. 

The preprocessing takes 0{rP') time and 0{n) space to construct U and sort a topo- 
logical order. To answer a query, it takes 0{n) time and 0{n) space to calculate function 
/, and the same for visiting 0{n) nodes of U and evaluating 0{n) inequalities. □ 

Flowever, for lower bound matrices, whether there is an algorithm better than 0{rP') 
time remains an open question. Unlike upper bound matrices having the tallest ultra- 
metric tree, lower bound matrices may have multiple minimal trees as shown in Section 
4. This asymmetry prevents the linear time checking algorithm in Lemma 10 from be- 
ing applicable to the lower bounds. 

Open Problem By preprocessing, is there an algorithm that can test whether a tree 
satisfies a lower bound matrix faster than Offi) time? 
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Abstract. What is the minimum number of yes-no questions needed to find an 
m bit number x in the set 5” = {0, 1, ... , 2™ — 1} if up to f answers may be 
erroneous/false ? In case when the {t + l)th question is adaptively asked after re- 
ceiving the answer to the tth question, the problem, posed by Ulam and Renyi, is a 
chapter of Berlekamp’s theory of error-correcting communication with feedback. 
It is known that, with finitely many exceptions, one can find x asking Berle- 
kamp’s minimum number (m) of questions, i.e., the smallest integer q such that 

2^ > 2™ ( (J) + H h (^) + g + 1) . At the opposite, nonadaptive extreme, 

when all questions are asked in a unique batch before receiving any answer, a se- 
arch strategy with qi{m) questions is the same as an f-error correcting code of 
length q(,{ra) having 2™ codewords. Such codes in general do not exist for f > 1. 
Focusing attention on the case f = 2, we shall show that, with the exception of 
m = 2 and m = 4, one can always find an unknown m bit number x € S' by 
asking q 2 (m) questions in two nonadaptive batches. Thus the results of our paper 
provide shortest strategies with as little adaptiveness/interaction as possible. 



1 Introduction 

Consider the following game: Two players, Paul and Carole, first fix a search space 
S = {0, 1, . . . , 2™ — 1}. Now Carole thinks of a number x £ S, and Paul must find out 
X by asking questions to which Carole can only answer yes or no. Assuming Carole is 
allowed to lie (or just to be inaccurate) in up to £ answers, what is the minimum number 
of questions needed by Paul to infallibly guess x7 

When the ith question is adaptively asked knowing the answer to the (t — l)th que- 
stion, the problem is generally referred to as the Ulam-Renyi problem, [13, p.47], [15, 
p.281], and naturally fits into Berlekamp’s theory of error-correcting communication 
with feedback [3] (also see [7] for a survey). Optimal solutions can be found in [1 1], [6], 
[10], and [14], respectively for the case f = 1, f = 2, f = 3, and for the general case. If 
one allows queries having k possible answers, the corresponding Ulam-Renyi problems 
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on /c-ary search with lies are solved in [ 1 ] for the case l=\, and in [ 5 ], for the case 

f = 2. 

At the other nonadaptive extreme, when all questions are asked in advance, the Ulam- 
Renyi problem amounts to finding an f -error correcting code, with IS"! many codewords 
of shortest length, where IS"! denotes the number of elements of S. As is well known, 
for f = 1 Hamming codes yield optimal searching strategies — indeed, Pelc [ 12 ] shows 
that adaptiveness is irrelevant even under the stronger assumption that repetition of the 
same question is forbidden. By contrast, for f > 1 a multitude of results (see, e.g., [8]) 
shows that nonadaptive search over a search space with 2™ elements generally cannot 
be implemented by strategies using q2{m) questions. 

In this paper we shall prove that when £ = 2 , searching strategies with q2{m) que- 
stions do exist with the least possible degree of adaptiveness/interaction. Specifically, 
Paul can infallibly guess Carole’s secret number a; € (S' by asking a first batch of m 
nonadaptive questions, and then, only depending on the m-tuple of Carole’s answers, 
asking a second mini-batch of n nonadaptive questions. Our strategies are the shortest 
possible, in that m + n coincides with Berlekamp’s lower bound q2 (m), the number of 
questions that are neeessary to accommodate all possible answering strategies of Carole. 

Since Paul is allowed to adapt his strategy only once, the results of our paper yield 
shortest 2-fault tolerant search strategies with minimum adaptiveness. 



2 The Ulam-Renyi Game 

Questions, answers, states, strategies 

By a question T we understand an arbitrary subset T of S. The opposite question is 
the complementary set S\T. In case Carole’s answer is “yes", numbers in T are said 
to satisfy Carole’s answer, while numbers in .S' \ T falsify it. Carole’s negative answer 
to T has the same effect as a positive answer to the opposite question S\T. Suppose 
questions ,Tt have been asked, and answers 6i , . . . ,bt have been received from 

Carole (bi G {no, yes}). Then a number y € -S' must be rejected from consideration if, 
and only if, it falsifies 3 or more answers. The remaining numbers of S still are possible 
candidates for the unknown x. All that Paul knows (Paul’s state of knowledge) is a 
triplet a = [Aq, A\, A2) of pairwise disjoint subsets of S, where Ai is the set of numbers 
falsifying i answers, i = 0 , 1 , 2 . The initial state is naturally given by ((S', 0 , 0 ). A state 
{Ao,Ai,A 2) isfinal iff A0UA1UA2 is empty, or has exactly one element. For any state 
a = (Aq, Ai, A2) and question T C S', the two states and cr"'° respectively resulting 

from Carole’s positive or negative answer, are given by 

a^- = (AonT, (Ao\T)U(AinT), (Ai \T) U (A 2 nT)) (1) 

a^° = {Ao\T, (AonT)U(Ai\T), (Ai nT) U (A 2 \T)). (2) 

Turning attention to questions Ti , . . . , Tt and answers & = 61 , . . . ,bt, iterated application 
of the above formulas yields a sequence of states 

Go = cr, cri=cTo\ a 2 = a’f , ..., at = (3) 
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By a strategy S with q questions we mean the full binary tree of depth q, where 
each node v is mapped into a question T„, and the two edges rjieft,r]right generated 
by u are respectively labeled yes and no. Let rj = rji,... ,rjq be a path in S, from 
the root to a leaf, with respective labels hi,... ,bg, generating nodes and 

associated questions , . . . , . Fixing an arbitrary state a, iterated application of 

(l)-( 2 ) naturally transforms a into cr’^ (where the dependence on the bj and is 
understood). We say that strategy S is winning for a iff for every path rj the state a't is 
final. A strategy is said to be nonadaptive iff all nodes at the same depth of the tree are 
mapped into the same question. 

Type, weight, character, Berlekamp’s lower bound 

Let (j = {Aq,Ai,A2) be a state. For each z = 0 , 1,2 let = \Ai\ be the number of 
elements of Ai. Then the triplet (ao,ai,a2) is called the type of a. The Berlekamp 
weight of a before q questions, q = 0 ,l, 2 ,... , is given by 

Wq{a) = ao +g+l^ +ai(g + l) + a2- (4) 

The eharacter ch{a) is the smallest integer q> 0 such that Wq{a) < 2 ^. By abuse of 
notation, the weight of any state a of type (00,01,02) before q questions shall be denoted 
tOq(oo, 01,02). Similarly, the character of a state a = {Aq,Ai,A2) of type (00,01,02) 
will also be denoted 0/1(00,01,02). As an immediate consequence of the definition we 
have the following monotonicity properties: For any two states a' = (Aq, A2) and 
a” = (Aq, A", A2) respectively of type (oq, 0^,02) and (00,01,02), if o( < a'[ for all 
z = 1 , 2,3 then 

ch{o') < ch{a”) and Wq{a') < Wq{o”) (5) 

for each q > 0 . Note that ch{a) =0 iff cr is a final state. 

The proof of the following results goes back to [ 3 ]. 

Lemma 1. Let u be an arbitrary state and T C S a question. Let and a^° be as 
in (l)-(2). We then have 

(i) (Conservation Law). For any integer q>l, 

Wq{a) =ZZ)g_i(cr*^®®) + ZOg_i(cr"-°). 

(ii) (Berlekamp’s lower bound) If a has a winning strategy with q questions then q > 
ch{a). 

A winning strategy for a with q questions is said to be perfect iff q = ch{a). We shall 
often write q 2 {m) instead of c/z( 2 ™, 0 , 0 ). Let a = (Ao,Ai,A2) be a state. Let T C S 
be a question. We then say that T is balanced for a iff for each j = 0 , 1 , 2 , we have 
l^inT| = |A,\T|. 

Lemma 2. Let T be a balanced question for a state a = (Ao, Ai, A 2 ). Let n = ch{a). 
Let and be as in (l)-(2) above. Then, for each integer q > 0 , 

(i) Wq{(jy-‘>)=Wq{a^°), 

(ii) ch{o-y^‘^) = ch{a^°) =n — l. 
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3 Background from Coding Theory 

Let a:,y G { 0 , 1 }". The Hamming distance dH{x,y) is defined by dH{x,y) = |{i G 
, n} \ xi ^ yi}\, where, as above, |2l| denotes the number of elements of A. The 
Hamming sphere Br {x) with radius r and center x is the set of elements of { 0 , 1 }" whose 
Hamming distance from a; is < r, in symbols, Bj.{x) = (y G { 0 , 1 }" | dH{x,y) < r}. 
The Hamming weight wh{x) of x is the number of nonzero digits of x. 

We refer to [8] for background in coding theory. Throughout this paper, by a code 
we shall mean a binary code, in the following sense: 

Definition 3 . A (binary) code C of length n is a nonempty subset of { 0 , 1 }". Its elements 
are called eodewords. The minimum distance of C is given by 6 {C) = min{di^(a:, y) \ 
xyy <eC,x fy}. We say that C is an {n,m,d) code iff C has length n, \C\ =mand 5 {C) = 
d. By definition, minimum weight o/C is the minimum ofthe Hamming weights of its 
codewords, in symbols, fj,{C) = min{tuij(a:) | x G C}. The minimum distance between 
two codes C\ and C2 is defined by A{Ci,C2) = min{dH{xyy) \ x G C\,y G C2}. 
Lemma 4 . Let e,n, mi,m2 be integers > 0 . For each i = 1,2 suppose Ci to be an 
{n,nii,di) code such that p{Ci) > e, di >3 and A(Ci,C2) > 1 . Then there exist 
two (n + 2 ,mi +m2,d'i) codes T>i (i = 1 , 2 ) such that d^ > 3 , /r(X’i) > e and 
A{Vi,V2)>1. 

Proof. Let us define ' 

Pi =Ci®(l, 0 ) U C2®(0,1) P2 =Ci®( 0 , 0 ) U C2®(1,1). (6) 

Obviously the length of Pi and P2 is n + 2 , and p(T>i ) , p{T>2 ) >e. Because of C\ ® 
(1,0) nC2®(0,l) = 0 and Ci ® (0,0) n C2 ® (1, 1) = 0, we get 

|Pi| = |Ci®(l, 0 )| + |C2®(0,l)| = mi+m2 = |Ci®( 0 , 0 )| + |C2®(l,l)| = IP2I. 

In order to show that d (Pi ) > 3 , note that any two codewords a; , y G Pi can be written as 
X = {xi,...,Xn, 01,02) and y = (yi,...,y„,6i,62), for some codewords (a:i,...,a;„), 
(yi,...,y„) GC1UC2 and (01,02), (&i,62) € {( 1 , 0 ), ( 0 , 1 )}. It follows that 

dn{x,y) = dH{{xi ,.. . ,x„), (yi,. . . ,y„)) + ^^((01,02), (hi, 62))- ( 7 ) 

Ifeither(a;i,...,a;„), (yi,...,y„) gCi or (a;i,...,a;„), (yi,. .. ,y„) G C2, then (01,02) = 
(61,62), whence dH{x,y) = dH{{xi,...,Xn), (yi,. .. ,y„)) > 3 , because diCfi, 5 (C 2 ) > 
3 . If (a;i,...,a;„) G Cl and (yi,...,y„) G C2, then (01,02) = ( 1 , 0 ) and (61,62) = 
(0,1), whence ^^((01,02), (61,62)) = 2 and diy((a;i, . . . ,a;„), (yi, . . . ,y„)) > 1, be- 
cause A{Ci,C2) > 1 . From ( 7 ) it follows that dH{x,y) > 3 . The case (a;i,...,a;„) G 
C2, (yi,...,yn) G Cl is symmetric. Thus 6(Pi) > 3 , as required. A similar argument 
shows that d(P2) > 3 . 

To conclude we must show Z\ (Pi , P2 ) > 1 . For any a; G Pi and y G P2 let us write 
X = {xi,...,Xn, 01,02) and y = (yi,...,y„,6i,62), for some codewords (a;i,...,a;„), 
(yi,---,yn) GC1UC2 and (01,02) G {( 1 , 0 ), ( 0 , 1 )}, (61,62) G {( 0 , 0 ), ( 1 , 1 )}. 

Then the desired result follows from ^^(aijy) > ((01,02), (61,62)) = 1 . 

* For any code C of length n, we denote with C ® (ai , . . . , a^) the eode of length n + k whose 
codewords are obtained by adding the sufhx (ai,...,Ofe) to all codewords of C, i.e., C(g> 
(oi,...,o/j,) {(a:i , • • • . . . ,Q,f) I (xi , . . . , Xn) G C}. 
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4 Perfect Strategies with Minimum Interaction 

The first batch of questions 

Recall that 92 (tn) = c/i(2™,0,0) is the smallest integer g > 0 such that 2^ > 2™((^) + g + 
1 ) . By Lemma 1 (ii), at least q 2 (m) questions are necessary for Paul to guess the unknown 
number Xcaroie € S = {0, 1, . . . , 2™ — 1}, if up to two answers may be erroneous. Aim 
of this paper is to prove that, conversely, (with the exception of m = 2 and m = 4) 
52 (m) questions are sufficient under the following constraint: Paul first sends to Carole a 
predetermined batch of m questions Di,... , Dm, and then, only depending on Carole’s 
answers, he sends the remaining 52 (m) — m questions in a second batch. 

The first batch of questions is easily described, as follows: For each i = 1, 2, . . . , m, 
let Di C S denote the question “Is the ith binary digit of xcaroie equal to 1 ?" Thus 
a number y £ S belongs to Di iff the ith bit yi of its binary expansion y = y\ . . .y-m 
is equal to 1. Identifying 1 = yes and 0 = no, let bi G {0,1} be Carole’s answer to 
question Di . Let b = bi . . .bm- Repeated application of ( 1 )-(2) beginning with the initial 
state a = {S, 0, 0), shows that Paul’s state of knowledge as an effect of Carole’s answers 
is a triplet = (Ao,Ai,A2), where 

Aq = the singleton containing the number whose binary expansion equals b 
Ai = {yeS\ dH(y,b) = 1} 

A 2 = {yeS\ dniyfi) = 2}. 

By direct verification we have | Aq | = 1, | Ai | = m, | A2 1 = (™) . Thus the state has 
type (l,m, (™))- As in (3), let ai be the state resulting after Carole’s first i answers, 
beginning with ao = a. Since each question Di is balanced for an easy induction 
using Lemma 2 yields ch{a^) ~ q 2 {rn) — m. 

The critical index vrin 

For each m-tuple b G (0, 1}™ of Carole’s answers, we shall construct a nonadaptive 
strategy Sb with ch{l,m, (™)) questions, which turns out to be winning for the state 
To this purpose, let us consider the values of ch{l,m, (™)) for m > 1. A direct compu- 
tation yields c/i(l, 1,0) = 4, c/i(l,2,l) = 5, c/i(l,3,3) = c/i(l,4,6) = 6, c/i(l,5,10) 
= ••• =c/i(l,8,28) = 7, c/i(l,9,36) = ••• = c/i(l, 14,91) = 8,. . . 

Definition 5. Let n > 4 be an arbitrary integer. The critical index nin is the largest 
integer m > 0 such that ch{l,m, (™)) = n. Thus, 

ch 

Lemma 6. Let n > 4 be an arbitrary integer. 

7T+ 1 

(i) Ifn is odd then nin = 2^“ — n — 1. 

Yl 1 

(ii) If n is even then, letting m* = [2^“ J — n — 1, we either have nin = m*, or 
nin = m* + 1. 

Proof. The case n = 4 is settled by direct verification, recalling that 7714 = 1. For n > 5 
see [9, Lemma 4.2]. 



nir, 

2 



: n and ch ' l,m„ -I- 1, 



nin + l 
2 



> n. 



( 8 ) 
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Strategies vs. Codes 

Lemma 7. Let m = 1, 2, 3, . . . and n = c/i(l, m, (™) ) . Then the following conditions 
are equivalent: 

(a) For QV&ry state a = {Aq,Ai,A 2 ) of type (l,m, (™)) there is a nonadaptive winning 
strategy for a with n questions; 

(b) For some state a = {Aq,Ai,A 2 ) of type (l,m, (™)) there is a nonadaptive winning 
strategy for a with n questions; 

(c) For some integer d>3 there exists an {n,m,d) code with minimum Flamming weight 
>4. 

Proof. The implication (a) (b) is trivial. To prove (b) (e) assume a = ( 2 I 0 ,Ai,A 2 ) 

to be a state of type ( 1 , m , (™) ) having a nonadaptive winning strategy S with n questions 
Ti,. . . ,T„. Let the map z e AqUAiUA 2 >-^ & {0, 1}" send each z e AqUAiUA 2 
into the n-tuple of bits z^ = zf .. .z^ arising from the sequenee of “true" answers to 
the questions “does 2 belong to Ti ?", “does z belong to T 2 ?", . . . , “does 2 belong to 
T„? " More precisely, for eaeh j = 1,. .. ,n, zf = 1 iff z e Tj. Let C C {0, 1}" be the 
range of the map z z^ . By hypothesis, Aq = {h}, for a unique element h e S. Let 
be the corresponding codeword in C. We shall first prove that, for some d > 3 the set 
G C I y e 2 I 1 } is an (n, m, d) code, and for every 2 G 2 I 1 we have the inequality 
du{z^ ,h^) > 4. The set Ci will be finally used to build a eode satisfying all eonditions 
in (e). 

Since S is winning, the map 2 ; hG is one-one, whence in particular |Ci | = m. By 
definition, C\ is a subset of {0, 1}". 

Claim 1. 6{C\) > 3. 

For otherwise (absurdum hypothesis) assuming c and d to be two distinct elements 
of Ai such that du{c^ ,d^) < 2, we will prove that S is not a winning strategy. We can 
safely assume = d^ for each j = 1, . . . , n — 2. Suppose Carole’s secret number is 
equal to c. Suppose Carole’s answer to question Tj is “yes" or “no" according as = 1 
or = 0, respectively. Then after Carole’s n — 2 answers, Paul’s state of knowledge 
has the form cr' = (A'q,A'i,A' 2 ), with {c,d} C A\, whence the type of a' is {aQ,a'i,a' 2 ) 
with a'l > 2. Since ch{a') > ch{0,2, 0) = 3, Lemma l(ii) shows that the remaining two 
questions/answers will not suffice to reach a final state, thus contradicting the assumption 
that S is winning. 

Claim 2. For each y G 2 I 1 we have the inequality dniv^jh^) > 4. 

For otherwise (absurdum hypothesis) let y G 2 I 1 be a counterexample, and du {y ^ , ) 

< 3. Writing = yf . . . yf and = hf . . . /if , it is no loss of generality to assume 
/if = yf , for all y = 1, . . . , n — 3. Suppose Carole’s secret number coincides with h. 
Suppose further that Carole’s answer to question Tj is “yes" or “no" according as 
/if = 1 or /if = 0, respectively. Then Paul’s state of knowledge after these answers 
will have the form a” = (Aq,A”,A 2 ), where Aq = {/i}, and y G A'f Therefore, since 
ch{a”) > c/i(l, 1,0) = 4, Lemma l(ii) again shows that three more questions will not 
suffice to find the unknown number. This contradicts the assumption that <5 is a winning 
strategy. 
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Having thus settled our two claims, let C 2 = {y®h | y € Ci}, where © stands for 
bitwise sum modulo 2. For any two distinct codewords a, 6 € C 2 we have a = c© h and 
6 = d © h, for uniquely determined elements c, d e Ci . From Claim 1 we get d^r (a, 6) = 
djy(c©h, d©h) = dH{c,d) > 3, whence ^(^ 2 ) > 3. Using the abbreviation 0 = 0-^ , 

n times 

by Claim 2 we have 

wh{o) = du{a, 0) = djy(c©h, h©h) = duic, h) > 4, 

whence /i(C 2 ) > 4. In conclusion, C 2 is an {n,m,d) code with d > 3 and y{C 2 ) > 4, as 
required. 

(c) (a) Let C be an {n,m,d) code, with d > 3 and > 4. Let H C {0, 1}” be the 

union of the Hamming spheres of radius 1 centered at the codewords of C, together with 
the Hamming sphere of radius 2 centered at 0, in symbols, % = ^^ 2 ( 0 ) Bi{x). 

Notice that ^^2(0),^^i(a;),i?i(y) are pairwise disjoint. Therefore \H\= ( 2 ) + n + 1 + 
m(n + 1) = ( 2 ) + (m+ l)(n+ 1). 

Let Cl = {0, 1}" \?d. Then |Ci| = 2" — ( 2 ) — (m + l)(n+ 1) > (™) . The second ine- 
quality follows for 2"' > ( 2 ) +n + 1 + m(n + 1) + (™), since n = ch{l,m, (™)). 

Let a = {Aq,Ai,A 2 ) be an arbitrary state of type (l,m, (™)), and write Aq = {h} for a 
unique element h £ S. Let us now fix, once and for all, two one-one maps /i : Hi — ^ C 
and /2 : H 2 — ^ Ci . The existence of /i and /2 is ensured by our assumptions about C, 
together with |Ci | > (™) . 

Let the map / : Hq UHi UH 2 — ^ {0, 1}" be defined by cases as follows: 

f 0, ye Ho 

/(y) = < (9) 
l/2(y),yeH2 

Note that / is one-one. For each y e Hq U Hi U H 2 and j = 1, . . . , n let f{y)j be the y th 
bit of the binary vector corresponding to y via /. 

Let the set Tj C S' be defined by Tj = {z £ S \ f{z)j =1}, where y = 1, . . . , n. 
This is Paul’s second batch of questions. Intuitively, letting xcaroie denote Carole’s 
secret number, Tj asks “is the yth bit of f{xcaroie) equal to one ?" Again writing 
yes = 1 and no = 0, Carole’s answers to questions Ti, . . . , T„ determine an n-tuple of 
bits a = ai . . .On- 

We shall show that the sequence Ti , . . . , T„ yields a perfect nonadaptive winning 
strategy for a. Let cti = , (J 2 = , . . . , an — ^“Ti. Arguing by cases we shall 

show that an = (HQ,Hj,H 2 ) is a final state. By (l)-(2), any z £ A 2 that falsifies > 0 
answers does not survive in an — in the sense that z f A^UAlUAf Further, any y € Hi 
that falsifies > 1 answers does not survive in an- And, of course, the same holds for h 
if it turns out to falsify > 2 answers. 

Case 1. a ^ 2 ( 0 ) U/(H 2 ) Si(/(y))- 

Then/i f HqUH^ UH^. As a matter of fact, from a 82 ( 0 ), it follows that dll' (/(h), a) 
= d_fi(0,a) > 2, whence h falsifies > 2 of the answers to Ti, . . . , Tn, and h does not 
survive in (T„. Similarly, for each y e Hi we must have y AqC A\ UH^. Indeed, 
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the assumption a ^ Bi{f{y)) implies dH{f{y),a) > 1, whence y falsifies > 1 of the 
answers to ,Tn, and y does not survive in cr„. Finally, for all z e 2I2 we have 

z ^ AqU Al UA 2 , because the assumption a ^ f{z) implies dH{f{z),a) > 0, whence 
z falsifies at least one of the answers to Ti,. .. , T„, and z then does not survive in a^- 
We have proved that AqUAI UA 2 is empty, and cr„ is a final state. 

Case 2. a £ f?2(0). 

Then h ^ Aq\J A\ U2I2, because dH{f{h),a) = dH(0,a) < 2, whence h falsifies 
< 2 answers. Our assumptions about C ensure that, for all y e Ai, a ^ B\{f{y)). Thus, 
dH{f{y),a) > 1 and y falsifies > 1 of the answers to ,Tn, whence y does not 

survive in This shows that y ^ A*q\J A\\J A *2 - A similar argument shows that, for all 
2: € A 2 , z ^ AqU A^U A 2 . Therefore, 21 q U U A^ only contains the element h, and 
Un is a final state. 

Case 3. ae Bi(f(y)) for some y e Ai. 

Arguing as in the previous cases, we see that y is an element of Aq U U A|, but 
neither h, nor any 2; € A2,norany y' € Ai withy' 7^ yean be an element of AqUAJ UA^. 
Thus On is a final state. 

Case 4. a = f{z) for some 2: G A2. 

Then the same arguments show that Aq U A]^ U A| only contains the element z, and 
On is a final state. 

Feasibility of the second batch of questions 

Lemma 8. For each integer n > 7 there exists an integer d > 3 and an (n, mn,d) eode 
Cn such that y{Cn) > 4. 

Proof. If 7 < n < 11 then by direct inspection in [4, Table TA], one can obtain an 
(n, e„,4) code An such that e„ > and /i(An) > 4. It follows that every set Cn C An 
such that I Cn |= is an (n, m„, 3) code with the additional property )J,{Cn) = 4, as 
required. Forn = 11, 12,by inspection in [4, Table 1-A], one can build an (n, ei,di) code 

yiA" 1 

I>n,iandan(n, 62,^2) code !>„, 2, such that 61,62 > 2^ ,diA 2 > 3,/x(I>n,i),M(iC>n,2) > 
4 and A{'Dn.i,'Dn, 2 ) > 1. Then Lemma 4 assures that such conditions hold for any 

rt+ 1 

n > 11. Upon noticing that 2^~ > m,n (Lemma 6), the desired conclusion follows by 
simply picking a subcode Cn C such that | Cn \= mn- 

Corollary 9. For m = bf,! . let o be a state of type (l,m,(™)). Then there exists 
a nonadaptive winning strategy S for o which is perfect. 

Proof. Let n = ch{o). From the assumption m > 5 we get n > 7. By Definition 5, 
m < mn. By Lemma 8 there exists an {n,mn,d) code Cn with d > 3 and p{Cn) > 4. 
Picking now a subcode C'n C Cn such that |C^| = m, and applying Lemma 7 we have 
the desired conclusion. 

The remaining cases 

We shall extend Corollary 9 to the case m = 1 and m = 3. Further, for m = 2 and 
m = 4 we shall prove that the shortest nonadaptive winning strategy for a state of type 
(l,m, (™)) requires c/i(l,m, (™)) + 1 questions. 
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Proposition 10. For each m= 1, 2, 3, 4, let A(m) be the length of the shortest nonad- 
aptive winning strategy for some (equivalently, for every) state of type Then 

A(l) = 4, A(2) = 6, A(3) = 6, X{i) = 7. For m e {1,3} the number \{m) 

satisfies the condition A(m) = ch{l,m, (™))- 

Proof. For m = 1 we have A(l) > c/i(l, 1, 0) = 4. Conversely, by Lemma 7, using the 
singleton code {1111}, we also get A(l) < 4. 

For m = 2, by [6, page 75-76], we have A(l) > 6. On the other hand, taking the code 
C = (111100, 001111} and using Lemma 7, we obtain A(2) < 6. 

Form = 3 we have A(3) > c/i(l,3,3) = 6. Conversely, using the code (111100, 110011, 
001111} and Lemma 7 we get A(3) < 6. 

Finally, let us consider the case m = 4. We have A(4) > c/i(l, 4, 6) = 6. In faet it can be 
proved that no (6,4, 3) code exists with minimum Hamming weight > 4. 

Indeed, write such a code as C = where = (a; G C | wh(x) =i}, 

Thus direct inspection shows that | < 1 and | < 3. Further, if \ = 3 

then = 0. In eonclusion \C\ < 3. 

This and Lemmas 7 are to the effect that A (4) > 7. On the other hand, taking the 
(7,4,3) code {1111000, 0001111, 0110011, 1111111} and again using Lemma 7, we 
obtain A (4) < 7, and the proof is complete. 

Combining Corollary 9 and Proposition 10 we have 

Theorem 11. For each integer m= 1, 3, 5, 6, 7, . . . there is a strategy to guess a num- 
ber X e (0, . . . , 2™ — 1} with up to two lies in the answers, using a first batch of m 
nonadaptive questions and then, only depending on the answers to these questions, a se- 
condbatchof c/i(2™,0,0) — m questions. In case m & {2, f] the shortest such strategy 
requires precisely c/i(2™, 0, 0) + 1 questions. 

As shown in [6], in the fully adaptive Ulam-Renyi game a perfect winning strategy 
exists if, and only if, m 2. 



5 Open Problems 

We eonclude this paper with two open questions. 

1 . Is minimum adaptiveness compatible with shortest strategies also when more than 
two tests may be erroneous? 

2. Suppose answers to be bits transmitted, e.g., by a satellite via a noisy channel. Then 
our results have the following equivalent counterpart: (i) the satellite sends us the m-tuple 
X of bits of the binary expansion of x\ (ii) we feed the reeeived m-tuple x' baek to the 
satellite, without distortion — e.g., via a noiseless feedback channel; (iii) only depending 
on x' the satellite sends us a final tip 2 ; of q 2 {m) — m bits, which is received by us as 2 ', 
in such a way that from the q 2 {m) bits x' z' we are able to recover x, even if one or two 
of the bits of x' z' are the result of distortion. Is it possible to reduce the m bit feedback 
in our model to a shorter feedback, while still allowing the satellite to send only q 2 {m) 
bits ? 
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Abstract. We present a highly tuned mergesort algorithm that improves the cost 
bounds when used to sort linked lists of elements. We provide empirical compari- 
sons of our algorithm with other mergesort algorithms. The paper also illustrates 
the sort of techniques that allow to speed a divide-and-conquer algorithm. 



1 Introduction 

The main goal of this paper is to improve the asymptotic average-case cost of merge- 

sort under the usual assumption that the list to be sorted includes a random permutation 
of n different keys. Recall that mergesort is the default algorithm to sort a linked list of 
keys. The meaning of “cost” above depends on the quantities of interest. For instance, 
for most of the versions of mergesort that we present in this paper, measuring as 
the number of key comparisons yields = nlog 2 n + o(n log n), which is optimal 
w.r.t. the leading term of the number of comparisons. However, as we shall see hereaf- 
ter, we may also consider other operations, like reading keys from memory, comparing 
pointers, reading pointers from memory, writing pointers into memory, accessing keys in 
local variables, updating keys in local variables, accessing pointers in local variables and 
updating pointers in local variables. Taking into account and reducing the contribution 
to the cost of all these operations produces a mergesort algorithm asymptotically twice 
faster (in many computers) than the first version we will start with. The author believes 
that many of the results presented in this paper are original and did not appear previously. 
For the improvements already in the literature, we provide references to books where 
those improvements can be found [1], [2], [5]. 

On the other hand, the paper also exemplifies the kind of arguments that allow to 
speed a divide-and-conquer algorithm, in particular which factors do affect the leading 
term of the cost and which are asymptotically dismissable. We illustrate how we can 
analyse with little effort the asymptotic expected number of the elemental operations 
mentioned above, always in order to reduce their contribution to the cost. In other words, 
we will not restrict ourselves to the typical approach in the analysis of sorting methods, 
which only considers the number of comparisons. Finally, we will show how such a 
“common sense” rule as “dividing the problem into two halves is the best choice for a 
sorting method” is by no means an unquestionable truth. 

* This research was supported by the ESPRIT LTR Projeet no. 20244 — ALCOM-IT — and the 
CICYT Project TIC97-1475-CE. 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 267-276, 1999. 

(g) Springer- Verlag Berlin Heidelberg 1999 
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typedef struct node *link; 

struct node ■[ long key; link next; }■; 

extern link DUMMY ; 

/* The list pointed to by c has at least one element. */ 
link mergesort (link c) 
link a, b; 

if (c->next == DUMMY) return c; 
a = c ; b = c->next ; 

while ((b != DUMMY) kk (b->next != DUMMY)) 

{ c = c->next ; b = b->next->next ; } 
b = c->next ; c->next = DUMMY ; 
return merge (mergesort (a) , mergesort (b) ) ; 

> 



Fig. 1. The first mergesort algorithm 



2 The First Mergesort Algorithm 

Our starting point is the version of mergesort for linked lists given in Sedgewick’s 
classic book [5, pages 355 and 356] (that algorithm is presented in Figures 1 and 2 for 
completeness). It has been slightly adapted from the original one, in order to make it 
directly comparable with the rest of versions in this paper. In particular, we assume that 
the keys in the nodes are long integers, and that the last node of every list does not store 
the pointer NULL, but instead a pointer DUMMY to a global node whose key is larger than 
any of the keys in the list. 

To analyse the cost of that mergesort algorithm (and most of the ones we will present 
later) we need to solve recurrences with the pattern 

Mn = tn + + -^[n/2J (1) 

for n > 2, with some explicitly known value for Mi . Here tn is the toll function, or 
non-recursive cost to sort a list with n nodes, which includes the cost to divide the list 
into two sublists plus the cost to merge the two recursively sorted sublists together. The 
toll function will turn out to be linear on n for any of the variants of mergesort introduced 
in this paper, in other words, we will always have tn = B -n + o{n) for some constant B 
that will depend on the particular variant of mergesort that we consider at every moment. 

There are several different mathematical techniques to solve recurrences following 
Equation (1). Since we are interested in analysing (and improving) the leading term of 
the asymptotic cost, we will use the so called master theorem, which is probably the 
easiest of the tools that provide this information. For instance, a simple application of 
the master theorem given in [4, page 452] yields the general solution 

M„ = i? • n log 2 n + o(n log n) . (2) 

Therefore, improving the asymptotic cost of mergesort is equivalent to reducing the 
constant B, which depends on the cost to divide the original list and merge the two 
sorted sublists. 
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link merge (link a, link b) 

■[ struct node bead; link c = fchead; 
while ((a != DUMMY) kk (b != DUMMY)) 

if (a->key <= b->key) ■[ c->next = a; c = a; a = a->next; } 
else { c->next = b; c = b; b = b->next; } 
c->next = (a == DUMMY) ? b : a; 
return head . next ; 



Fig. 2. The first merge algorithm 



In the computation of B we will consider nine different elemental operations: Com- 
paring two keys, reading a key — the key field — from memory given a pointer to the 
node where it is stored, comparing a pointer against DUMMY, reading (writing) a pointer 
— the next field — from (into) memory given a pointer to the node where it is (must be) 
stored, accessing (updating) a key stored in a local variable, and accessing (updating) 
a pointer stored in a local variable. The cost of these operations will be denoted Ck, 
TZk, Cp, TZp, Wp, Ak,Uk, Ap and Up, respectively. Since all the algorithms presented in 
this paper are written in C, we have to make (reasonable) assumptions about how the C 
instructions are translated in terms of the elemental operations above. If necessary, those 
assumptions (which will become clear from the examples below) could be changed and 
the analyses in this paper recomputed accordingly. 

Let us analyse the cost of the merge function in Figure 2. The body of the while 
is executed about n times on the average, because at each iteration we move one step 
forward in either a or b, and when the first of a or b reaches its end the other list is almost 
empty with high probability. Loosely speaking, this is what the following proposition 
states. 

Proposition 1. Let i and m denote the number of keys in a and the number of keys 
in b, respectively. Then the expected number of keys in b larger than any of the keys 
in a is mjlf -L 1), and the expected number of keys in a larger than any of the keys in b 
is ij(rn + 1). 

Proof. Every of the keys in a or b has the same probability to appear in a than to appear 
in b. Choose a key from b at random. Then the probability that this key is larger than any 
of the i keys in a is clearly l/(f -f 1), since this event is equivalent to randomly pick, 
from a set with f + 1 keys, a key which turns out to be the largest of the set. This argument 
can be extended to each of the m keys in b, concluding that the expected number of keys 
larger than all the f keys in a is m/(f -L 1). A similar argument proves the symmetrical 
case. ■ 

Setting £ = [n/2] and m = [n/2j in the proposition above yields n — o(n) (more 
precisely, n — 0(1)) as the expected number of iterations of the loop in Figure 2. Moreo- 
ver, the rest of operations (passing parameters and allowing space for local variables, 
the first assignment to c, the final assignment to c->next and returning a pointer to the 
sorted list) only contributes 0(1) time to the merging process. Therefore, we focus our 
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/* The list c has n elements, where 1 <= t <= n. */ 
link mergesort (link c, long t, link *s) 
link ml, m2, s_rec; long t_rec; 

if (t == 1) ■[ *s = c->next; c->next = DUMMY; return c; } 
t_rec = (t+l)/2; 

ml = mergesort (c, t_rec, &s_rec) ; 
m2 = mergesort (s_rec , t - t_rec, s) ; 
return merge (ml, m2); 

> 



Fig. 3. The second mergesort algorithm 



attention on the cost of each iteration: we twice access a pointer and compare it against 
DUMMY; we then read two keys from memory and compare them; we write a next field, 
access two pointers, update two pointers, and read a next field. The constant of the 
linear term in the cost to merge the sorted sublists is thus 

2(,Ap + Cp) + (27?.fe + Ck) + (VVp + 2,Ap + 2ZTp + 7?.p) . (3) 

On the other hand, the body of the while in Figure 1 is executed [(n — 1) /2j times, so 
by a reasoning similar to the one above we can compute the constant of the linear term 
in the cost to split the list into two sublists as 

{{Ap + Cp) + (TZp + Cp) + {2Up + 3TZp))/2 . (4) 

Altogether, adding the constants in (3) and (4), we conclude that the cost of the mergesort 
algorithm in Figure 1 (with the merge function in Figure 2) is 

Mn (Ck T 27?.fc + 3Cp + 37?.p + Wp + 9ylp/2 + ‘SlAp)n log 2 n . (5) 

3 Improving Mergesort 

In order to improve the mergesort algorithm in the last section, we first realise that 
we should pass to it as an argument the number of nodes in the (sub)list to be sorted. 
This would allow us to reduce from linear to a constant the cost to divide the list into 
two parts, thus making this cost asymptotically irrelevant (see [1, page 173]). The trick 
consists in adding a parameter to the mergesort procedure. Then it would receive as input 
two parameters: a pointer c to the beginning of a list, and an integer t. This mergesort 
procedure only sorts the first t nodes of the list c, and returns two pointers: one to the 
beginning of the now sorted sublist with t nodes, the other to the beginning of another 
sublist with the nodes not sorted yet. As can be seen in Figure 3, after sorting the first 
half of the list we already have a pointer to the beginning of the second half of the list 
(because to sort a sublist we certainly must reach its end at least once), and therefore we 
do not need to explicitly traverse the original list to be sorted, nor to its end not even to 
its middle. 
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link merge (link a, link b) 

■[ struct node bead; link c = fchead; 
while (b != DUMMY) 

if (a->key <= b->key) ■[ c->next = a; c = a; a = a->next; } 
else { c->next = b; c = b; b = b->next; } 

c->next = a; 
return head . next ; 



Fig. 4. The seeond merge algorithm 



By this easy improvement, the process of dividing the list into two parts is no longer 
asymptotically relevant to the cost of mergesort. The cost of the procedure in Figure 3 
(with t = n) only depends significantly on the constant of the cost of the merging 
phase (3), and hence is 

Mn (Ck + 27?.fe + 2Cp + 7i,p + Wp + 4ylp + 2Wp) n log 2 n . (6) 

From now on, we will assume that our mergesort algorithm is the one given in 
Figure 3, and we will concentrate on finding faster merge functions. All the merge 
functions presented here have been devised to work correctly if there were repeated 
keys, and, moreover, to preserve the relative order of equal keys in the list to be sorted. 
Therefore, all the versions of mergesort in this paper are stable. 

The first improved version of merge is presented in Figure 4. The unique difference 
with the one given in Figure 2 is that the while only checks for b to equal DUMMY. 
This merge function works even when the sublist a reaches its end before the sublist b, 
because at that moment a equals DUMMY, a pointer to a global node with key larger than 
any of the keys in the list. Therefore, the comparison (a->key <= b->key) always 
fails, and we move forward in b until we reach its end. 

The reason for the variation above is clear; under the assumption that we are given a 
random permutation of keys, by Proposition 1 the expected number of iterations of the 
loop is still n — o(n), while the cost of each iteration reduces in Ap + Cp, the cost of the 
comparison (a == DUMMY) . The cost of mergesort with this variant of merge is thus 

{Ck + 27?.jt + Cp + Ti,p + yVp + 3Ap + log 2 n . (7) 

The next improvement comes from the observation that we are updating as many 
pointers as we are traversing, but roughly half of them are already pointing to the right 
node (this happens when two consecutive nodes in a or b also appear consecutively in 
the final list). Therefore, we modify the merge function to traverse but not update those 
pointers (see [2, page 167, exercise 15]). As it is shown in Figure 5, the main loop keeps 
the invariant a->key < b->key, where a is the node that must be attached next to c. In 
that loop, we first link a to c, updating conveniently a and c, and afterwards we keep 
traversing pointers in the list a until we reach a node such that a->key > b->key. At 
that moment, the same process starts again, but interchanging the roles of a and b. The 
rest of the merge function is designed to fit the main while. 
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link merge (link a, link b) 

■[ struct node bead; link c; 

if (a->key <= b->key) c = fehead; 
else { head. next = b; c = b; b = b->next; 

while (a->key > b->key) { c = b; b = b->next; } 

> 

while (b != DUMMY) 

■[ c->next = a; c = a; a = a->next; 

while (a->key <= b->key) { c = a; a = a->next; } 

c->next = b; c = b; b = b->next; 

while (a->key > b->key) { c = b; b = b->next; } 

> 

c->next = a; 
return head . next ; 



Fig. 5. The third merge algorithm 



link mergesed3(link a, link b) 

{ struct node head; link c; long key_a, key_b; 

if (a->key <= b->key) { head. next = a; key_b = b->key; } 
else { head. next = b; c = b; b = b->next; 

while (a->key > b->key) { c = b; b = b->next; } 
c->next = a; key_b = b->key; 



> 

while (b != DUMMY) 
■[ for(; ;) 

{ c = a->next; 
if ((key_a = 
a = c->next; 
if ((key_a = 

y 

for( ; ; ) 

{ c = b->next; 
if ((key_b = 
b = c->next; 
if ((key_b = 

> 



c->key) 

a->key) 



c->key) 

b->key) 



> key_b) { a->next = b; a = c; 

> key_b) { c->next = b; break; 

>= key_a) { b->next = a ; b = c 
>= key_a) { c->next = a; break; 



> 



return head . next ; 

> 



Fig. 6. The fourth merge algorithm 



break; } 

}■ 

; break; } 

> 



The analysis of the eost of this variant of merge is similar to the ones done before. 
The eost of all the instruetions out of the main loop is again 0(1). The main loop is 
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500 keys 


5000 keys 


50000 keys 


500000 keys 


Figures 1 and 2 


0.014548 s 
(100) 


0.20727 s 
(100) 


2.8083 s 
(100) 


43.004 s 
(100) 


Figures 1 and 4 


0.013595 s 
(93.45) 


0.19195 s 
(92.61) 


2.5864 s 
(92.10) 


40.555 s 
(94.31) 


Figures 1 and 5 


0.012492 s 
(85.87) 


0.17574 s 
(84.79) 


2.3820 s 
(84.82) 


38.593 s 
(89.74) 


Figures 1 and 6 


0.011921 s 
(81.94) 


0.16718 s 
(80.66) 


2.3149 s 
(82.43) 


38.010 s 
(88.39) 


Figures 3 and 2 


0.012039 s 
(82.75) 


0.16235 s 
(78.33) 


1.9635 s 
(69.92) 


26.579 s 
(61.81) 


Figures 3 and 4 


0.011154 s 
(76.67) 


0.14747 s 
(71.49) 


1.7631 s 
(62.78) 


24.446 s 
(56.85) 


Figures 3 and 5 


0.010074 s 
(69.25) 


0.13171 s 
(63.55) 


1.5482 s 
(55.13) 


22.098 s 
(51.39) 


Figures 3 and 6 


0.009453 s 
(64.98) 


0.12230 s 
(59.01) 


1.4733 s 
(52.46) 


21.590 s 
(50.21) 



Fig. 7. Empirical times for different algorithms and list sizes 



executed once for every subsequence with keys from b that appear, in the final list, after 
some key from a. But this quantity is easy to compute. 

Proposition 2. Let I and m denote the number of keys in a and the number of keys in b, 
respectively. Then the expected number of keys from b which appear immediately after 
a key from a in the final list is i ■ m/{£ + m). 

Proof. Consider the list once it is sorted. Each position of the list has a key from b with 
probability m/(f + m), because a and b are random permutations. If this happens, each 
position (except the first one) has a key from a before it with probability + m — 1). 
The expected contribution of every position of the list but the first one to the quantity we 
are computing is thus £ ■ m/ (£ + m)(£ + m — 1), and there are {£ + m— 1) of them. ■ 

Therefore, the main loop is executed n/4 — o(n) times on the average (just plug £ = 
[n/2] and m = [n/2j in the last proposition), and the first and third lines in it too. The 
comparison of the while in the second line inside the main loop is evaluated once for 
every key in a (except maybe for a few keys at the very end), which means n/2 — o(n) 
times on the average. The body of that whi le is executed the same number of times, minus 
one per each iteration ofthe main loop, that is, (n/2 — o(n)) — (n/4— o(n)) = n/4+o(n) 
times on the average. Finally, the asymptotic cost of the fourth line is the same as that 
of the second one. Altogether, the constant B associated to this merge function is 

+ Cp ) /4 + 2{Wp + 2.Ap + 2lAp + B-p) j 4 + 2(27?.^ +Cfe)/2 + 2{2lLp + .Tp + 'R-p ) / 4 , 



and the cost of this variant of mergesort 

M-n {Ck T 2'Rk T Cp ! 4 + TLp + 4Vp/2 + 1 fi.pj 4 + 2lAp^n log 2 n . (8) 
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Let us speed a little more the last merge funetion. On the one hand, every key in a 
and b is read from memory twice on the average, which can be reduced to just once 
by storing the key at the beginning of the current a and the key at the beginning of the 
current b in a couple of local variables. On the other hand, while traversing pointers 
in, say a, we do { c = a; a = a->next; }, but it would be cheaper to do { c = 
a->next ; } and { a = c->next ; } alternatively, keeping track of which of a or c 
points to the node with the key to be compared next, and which to the previous node 
(this is sort of bootstrapping). These improvements can be found in Figure 6. 

The analysis of this variant of merge follows the same lines as those presented before. 
The comparison of b against DUMMY is executed n/4 + o(n) times on the average. The 
first for is related to the keys in a. For each of those keys (except maybe for a small 
number of keys at the end of a) we either make the assignment in the first line and the 
comparison in the second line, or the assignment in the third line and the comparison 
in the fourth line. The same is true regarding the second for and the keys in b (except 
maybe for a small number of keys at the beginning of b). Therefore, we can add n — o(n) 
times the cost of one of those assignments and one of those comparisons to the cost of 
merging. For the moment we have a term 

(A.p + Cp)/4 + {Up + TZp) + {TZk + Uk + A-k + Ck) (9) 

contributing to the constant B. 

We still have left computing the contribution of the sentences between brackets in the 
second and fourth lines of both f or’s. For instance, the set of sentences { a->next = 
b; a = c; break; } is executed once for every key /c from a which lies immediately 
before a key from b in the final list, if the subsequence of keys from a where k appears 
has an odd number of keys. It is difficult to compute an exact expression for this quantity. 
Flowever, under the assumption that the number of keys in a is p • n + (9(1 ) for some 0 < 
p < 1, the following proposition (we omit its proof) gives us a useful asymptotic 
approximation. 

Proposition 3. Let £ = p ■ n + (9(1) > 0 with 0 < p < 1 be the number of keys in a. 
Then the expected number of odd-length subsequences with keys from a which appear 
before some key from b in the final list is (1 — p)p/ (1 + p) • n + o(n). 

Very similar arguments and asymptotic approximations produce the quantities (1 — 
p)p^/ (1 + p) • n + o(n) as the expected number of executions of the sentences between 
brackets in the fourth line of the first for, (1 — p)p/(2 — p) • n + o(n) for the second 
line of the second for, and (1 — p)^p/(2 — p) • n + o(n) for the fourth line of the second 
for. Their total contribution to the constant B is thus 

2(l/6(Wp + 2Ap+Wp) + l/12(Wp + Ap)) , (10) 

and adding (9) and (10) we deduce that this version of mergesort has cost 
Mn (Cfe + + Cp/4 + 7?.p + FVp/2 + .Afe + 13.Ap/12 + 4(Tp/3^u log2 n . (11) 

In Figure 7 we show empirical average times (in seconds) for all the possible combi- 
nations of the mergesort algorithms and merge functions given in this paper. Those times 
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were obtained by sorting several times (10000, 1000, 100 and 10 times, respectively) 
random lists with 500, 5000, 50000 and 500000 keys with a PC. Between parentheses 
we provide the relative times (in pereentage) w.r.t. the first algorithm. It should he elear 
from Figure 7 that the theoretieal improvements do reduee significantly the actual time 
to sort. 

The last version of mergesort is optimal regarding the number of comparisons, and, at 
the merging phase, reads from memory each key and each pointer at most once, updates 
only the necessary pointers, and compares as few pointers against DUMMY as possible. 
So we could wonder if we ean improve mergesort further. Most surprisingly, there is an 
affirmative answer to this question (at least from a theoretieal point of view). 

For the moment we have assumed that dividing the list to be sorted into two halves 
is the most reasonable ehoice. However, we ean easily compute the expected cost of 
mergesort if we divide the original list into two sublists with f = p ■ n + 0(1) and 
m = n — £ keys each, with 0 < p < 1 and p ^ 1/2. Making use of Propositions 1, 2 
and 3, we deduce that the eonstant B of the toll function in this case is 



B{p) = (1 — p)p{Ap + Cp) + {Up + TZp) + {TZk +Uk + Ak + Ck) ( 12 ) 

+ (1 — p)p/ {1 + p){yVp + 2Ap + Up) + (1 — p)p^ / {1 + p){yVp + Ap) 

+ (1 ~ p)p/(2 ~ p){yVp + 2Ap + Up) + (1 — p)‘^pj {2 — p){yVp + Ap) ■ 

Applying the discrete master theorem [4, page 452], we get the solution 



= 



B{p) -nlog2n 



o(n log n) 



where H(p) = -(plogaPT (1 -p)log 2 (l -p)). 

The question is which is the value of p that minimises the funetion f{p) = B{p)/'H{p). 
In this paper, we only eonsider the most signifieant operations, related to the eosts Ck, 
7?.fe,Cp,7?.p and >Vp. In this case we have i?(p) = {Ck + 'R-k + 'R-p) + i^—p)p{Cp + 2Wp), 
and the analysis of the behaviour of f{p) produees the surprising result that p = 1/2 is 
not always the optimal ehoice. Indeed, when {Cp + 2Wp) > r ■ {Ck + TZk + Ti-p) for 
the threshold value r = 4/(2 In 2 — 1) ~ 10.35480, the function /(p) achieves a local 
mcaimum at p = 1/2, and two symmetrical absolute minima at p* and 1 — p* , where p* 
is the unique solution of the equation 



Cp + 21% ^ log2p-log2(l -p) 

Ck + Tlk+Tlp (1 -p)2log2(l -p) -p2fog2P 



in the interval (0, 1/2). The intuition is that choosing p A ^1“^ increases the number 
of key comparisons, read keys and read pointers, but can reduce the number of pointer 
comparisons and updated pointers. A similar result (and the same threshold value) was 
shown for quicksort when used to sort an array of keys [3]. Note that from a practical 
standpoint, it is not likely (at least under the current technology) that {Cp + 2>Vp) > 
T ■ {Ck + Tlk + 'R'p) for the usual times of elemental operations. 
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Abstract. Symbol ranking compression algorithms are known to achieve a very 
good eompression ratio. Off-line symbol ranking algorithms (e.g., bzip, szip) are 
currently the state of the art for lossless data eompression because of their excellent 
eompression/time trade-off. 

Some on-line symbol ranking algorithms have been proposed in the past. They 
compress well but their slowness make them impractical. In this paper we design 
some fast on-line symbol ranking algorithms by fine tuning two data struetures 
(skip lists and ternary trees) which are well known for their simplicity and effi- 
ciency. 



1 Introduction 

The introduction of the Burrows-Wheeler transform (BWT) [5,13] has set a new standard 
in data compression algorithms. Algorithms based on the BWT have speed comparable to 
the parsing-based algorithms (such as gzip and pkzip), and achieve a compression close 
to the much slower PPM-based algorithms (for the results of extensive testing see [1,8]). 
Two highly optimized compressors (bzip and Szip) which are based on the BWT are 
now available for many platforms [15,17]. The main drawback of these algorithms is 
that they are not on-line, that is, they must process the whole input (or a large portion 
of it) before a single output bit can be produced. Since in many applications (e.g., data 
transmission) this is not acceptable, several efforts [9,12,20] have been done to design 
on-line counterparts of BWT-based algorithms. Since BWT-based algorithms can be 
seen as symbol ranking algorithms, the new algorithms are usually referred to as on-line 
symbol ranking algorithms. 

Compared to bzip (the best known BWT based algorithm) on-line algorithms achieve 
a slightly worse compression ratio. This was to be expected since on-line algorithms 
make decisions knowing only a part of the input. Unfortunately, the on-line algorithms 
described in [9,12,20] are also slower than bzip (of a factor 10 or more) and this makes 
them impractical. 

In this paper we use two well known data structures: skip lists [11,16] and ternary 
trees [2,16] to design several efficient on-line symbol ranking algorithms. We show that 
after some fine tuning our algorithms outperform gzip (in terms of compression/ speed 
trade-off) and, for some kind of input files, they are as fast as bzip. 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 277-288, 1999. 

(g) Springer- Verlag Berlin Heidelberg 1999 
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size 


context 


predictions 


3. 


iss 


{ - } 


2. 


ss 


{ - } 


1. 


s 


{ S, w } 


0. 


nil 


{ s, i, m, w } 



Fig. 1. Construction of the next symbol candidate list in the Howard- Vitter symbol ranking al- 
gorithm with c = 3. Assume the input string is s = swiss_miss_is_niissing, and that we are 
coding the last i. For i = 3, 2, 1, 0, we show the context of size i and the list of symbols following 
this context ordered by recency of occurrence. Note that i = 0 correspond to the empty context. 
In this case the prediction is done considering all symbols seen so far ordered by recency of oc- 
currence. The list of candidate next symbol is obtained merging the predictions of each context 
removing duplicates. In our example we get s, w, i, m} so the incoming symbol i is 
encoded with the rank 3. 

2 Symbol Ranking Compression 

Loosely speaking an on-line symbol ranking eompression algorithm works as follows. 
Let s' denote the portion of the input string which has been already coded. On the basis 
of s' statistics, the algorithm builds a list of candidate next symbol ranked by likelihood 
of occurrence. That is, the most likely symbol is put in position 0, the second most 
likely in position 1, and so on. The incoming symbol, that is the symbol following s', 
is then encoded with its position in this list. These encoded values, which hopefully 
consist mainly of small integers, are then compressed using on-line (one pass) Huffman 
coding [18] or arithmetic coding [19]. 

The first symbol ranking algorithm proposed in the literature is the one by Howard 
and Vitter [10]. Given s', this algorithm considers its suffix of size c, w = /?i • • • /?c, 
where c > 0 is an assigned constant. The string w is called the the size c context of the 
incoming symbol. The list of candidate next symbol is built by looking at the previous 
occurrences of w inside s' and choosing the symbols following these occurrences. The 
process is repeated with the context of size c — 1, w' = j32 ■ ■ ■ j3c, the context of size 
c — 2, and so on (the details are given in Fig. 1). 

Recently, other on-line symbol ranking algorithms have been proposed [9,12,20]. 
These algorithms build the candidate list using strategies similar to the one of Fig. 1. 
The main difference is that, instead of considering the context of a fixed size c, they look 
for the longest suffix of s' which appears elsewhere in s' and use it as the starting context. 
In other words, they use the longest context which can be used to get a prediction for 
the incoming symbol. 

These algorithms are considered the on-line counterparts of the BWT-based algo- 
rithms. In fact, as observed for example in [6,8], BWT-based algorithms use essentially 
the same principle, the only difference being that they make predictions on the basis of 
the whole input string. This is possible because they look at the whole input before the 
actual coding is done. 

On-line symbol ranking algorithms can be seen also as simplified versions of PPM 
algorithms [6,7]. For each context PPM algorithms maintain the list of candidate next 
symbol enriched with a frequency count for each symbol. The incoming symbol is en- 
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coded using roughly — log p bits, where p is the “empirical probability” of the incoming 
symbol determined on the basis of the frequency counts. 

Not surprisingly, on-line symbol ranking algorithms achieve a compression ratio 
which is not as good as that of BWT-based algorithms or PPM algorithms, but they 
are generally superior to the dictionary based eompressors (such as COmpress, and 
gzip). Unfortunately, on-line symbol ranking algorithms are not competitive in terms of 
running time. Howard and Vitter [10] develop fast proeedures for encoding the ranks, but 
they do not provide enough data to assess the overall performance of their algorithms. 
Yokoo [20] reports that his algorithm is twenty times slower than gzip. For the algorithms 
in [12] and [9] we can only estimate, on the basis of the reported running times, that 
they are respectively fifteen times and seventy times slower than gzip. 

In the next section we face the problem of designing efficient on-line symbol ranking 
algorithms. We concentrate our efforts in the design of fast procedures for determining 
the rank of the incoming symbol. In the past this was done using a variety of data 
structures: a binary tree with additional links in [20], suffix trees in [12], hashing in [9], 
and a multiply linked list in [10]. In our algorithms we use skip lists [11], and ternary 
trees [2], improved with some ad hoc modifications. 

Note that our work is complementary to the one of Howard and Vitter [10]. They 
use a standard data structure for determining the ranks and develop fast procedures for 
encoding them (quasi-arithmetic coding and Rice eoding). On the contrary, we develop 
new data structures for fast determination of the ranks and encode them using, as a black 
box, the CACM ’87 arithmetie eoding routines [19]. Finally, we stress that arithmetic 
coding has some latency, that is, it sometimes requires several input symbols before 
emitting some output. Henee, our algorithms are not strictly on-line * . If one wants a 
completely on-line algorithm the final encoding should be done using Huffman coding 
(possibly with multiple tables [17] to improve eompression). 



3 Design of Efficient Symbol Ranking Algorithms 

In this seetion we describe several on-line symbol ranking algorithms. Our emphasis is 
on the design of efficient procedures for determining the rank of the incoming symbol. 
In all our algorithms we eompress the sequence of ranks by means of arithmetic coding 
but we did not try to optimize this step. 

Let s denote the whole input string and let s' denote the portion of the input which has 
been already coded. Our first design decision was to consider the suffixes of s' (which 
are the contexts of the incoming symbol) only up to a fixed maximum size c > 0. This 
is the approach of Howard and Vitter [10] (see Fig. 1). We decided to follow it, instead 
of the “unbounded context” approach of the more reeent algorithms, for two reasons. 
The first one is that a bounded context leads to more efficient algorithms, and the second 
is that the results in [14] show that (for BWT-based algorithms) even a small context 
yields a very good compression. 

Our basie strategy for determining the rank of the ineoming symbol works as follows. 
We store all c-length eontexts (that is, all c-length substrings) which have appeared in s' . 

' Note, however, that the lateney of arithmetic coding is negligible compared to the one of the 
Burrows-Wheeler algorithm. 
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For each context we maintain the list of the symbols following it ordered by recency of 
occurrence. This is achieved using the Move-to-Front strategy [4] : when we encounter 
the context w followed by the symbol a, we put a at the top of the list associated to w. 
Because of this update strategy we call this list the MTF list associated to w. MTF 
lists are used to determine the rank of the incoming symbol as follows. If the incoming 
symbol a is in the MTF list of the current context w its rank is simply its position in 
this list. This is by far the most common case. However, at the first occurrence of wa 
in s, a will not be in tu’s MTF list. In this case we say that a is “new” and its rank is 
determined by considering either a shorter context or those c-length contexts which are 
similar to w. We stress that the candidate next symbol list mentioned in the previous 
section was introduced only to illustrate the working of symbol ranking algorithms but 
we do not need to actually build it. Our algorithms build (implicitly) only its upper part, 
that is, the minimum portion required to determine the rank of the incoming symbol. 

The algorithms described in the following sections differ in the strategy for encoding 
“new” symbols and in the data structure we use for maintaining the c-length substrings 
of s. Note that this data structure should perform efficiently a single operation: to locate 
a string which has been already inserted and to insert it if not present. We call this a 
find/insert operation. 

3.1 Skip List Based Algorithms 

In our first algorithm we maintain the ordered list of contexts ( c-length substrings) using 
a skip list. The skip list [11] is a probabilistic data structure which has simple and 
efficient algorithms for the f ind/insert operation whose cost is, with high probability, 
logarithmic in the number of list elements. In order to use a skip list each context w 
must be converted to an integer: this is easily done by juxtaposing the bit representation 
of the symbols in w. Since the GNU C compiler supports 64 bit long integers, with a 
single value we can handle contexts of size up to 8. This is more than adequate since in 
our experiments we found that increasing the context beyond 6 increases significantly 
the memory usage without providing a noticeable improvement in the compression ratio 
(this is in accordance with the results in [14]). 

The conversion from a string w = /?i • • • /?c to an integer Nyj is done “reversing” 
the symbols in w. That is, the most significant bits of Nyj are those of pc, while the less 
significant bits are those of /?i . As a result, in the skip list the context are sorted in right 
to left lexicographic order. This does not affect the performance of the f ind/insert 
operation, but ensures that the contexts close to w in the skip list, are (in general) similar to 
w (since they usually have a common suffix). Therefore, each time the incoming symbol 
a is “new”, we compute its rank considering the MTF lists of the elements adjacent to w 
in the skip list. The details of the algorithm are best described with an example and are 
given in Fig. 2. Note that in [1 1] skip list elements have only forward pointers; since we 
need to access the contexts preceding and following the current context, we must add a 
backward pointer to each skip list element. 

The MTF lists are implemented using an array of char’s (8-bit variables). This 
implementation requires very little memory, but has the drawback that we must examine 
all symbols in the MTF list until we find the desired one. In addition, when a new 
symbol must be inserted the whole list must be copied to a new (larger) array. Profiling 
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5. s_ — 


{m, 


i} 


8. _i - 


-^{s} 




7. mi — 


-^{s} 




2. wi — 


-^{s} 




6. _cti — 






3. is — 




s} 


4. ss — 


{i, 


-} 


I. sw — 







Fig. 2. Example of a symbol ranking algorithm based on a skip list. Assume that c = 2, the input 
string is s = swiss_miss_is_iiiissing, and that we are coding the letter n. Eight distinct contexts 
have been encountered so far; in the skip list they are ordered in right to left lexicographic order. 
The numbers on the left denote the order in which contexts have been inserted in the list. The 
rightmost column show the MTF list for each context. The current context is si; it will be inserted 
between mi and wi. Our algorithm will search the incoming symbol n in the MTF lists according 
to the following order: si, wi, mi, _m, _i, and so on. In this specific example the incoming symbol 
does not belong to any of these lists. The important point is that the contexts “similar” to s i (those 
ending in i) are considered first. After the coding, the symbol n is added in the MTF list of the 
newly created context si; the other MTF lists are not modified. 

5. s_ — )-{(m,(5), (i,^)} 

8 . _i ^ {(s,i)} 

7. mi — {(s,J)} 

2 . wi — {(s,J)} 
d. _m ^ {(i.7)} 

3. is ^ {(-,5), (s,4)} 

4. ss ^ {(i,9) (_,5)} 

I. sw — {(i,2)} 

Fig. 3. The example of Fig. 2 with the MTF pointers added. If at any time during the coding we 
encounter the context is followed by the symbol s, we reach the skip list element corresponding 
to the next context (ss) following the pointer 4 which is associated to s in the is MTF list. 



shows that these drawbacks have little impact in practice. This is probably due to the fact 
that the input of data compression algorithms are (usually) files with a strong structure. 
Experiments show that when we compress a (large) text file using a context of size 4, 
the incoming symbol a is the first symbol of the MTF list more than half of the times. 
For these reasons we did not implement the data structure for MTF lists described in [3] 
which is asymptotically faster but require a larger amount of memory. 

The algorithm we have just described is already reasonably elficient. We can make it 
faster by storing some additional information in our data structure. Profiling shows that 
the f ind/insert operation takes up to the 35% of the total running time. We observe 
that the current context w = pi ■ ■ ■ and the next symbol a completely determine the 
next context (which is in fact /?2 • • • pcCp. Hence, we can associate to each symbol in 
the MTF list a pointer to the skip list element corresponding to the next context. We call 
these additional pointers MTF pointers (see Fig. 3). 

The use of MTF pointers increases the cost of updating the MTF list (symbols and 
pointers must be moved in lockstep), but this is (usually) more than compensated by the 
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reduced number of find/insert operations. A big drawback of MTF pointers is that 
they increase significantly the space requirement of the algorithm. For this reason we 
have tested a variant in which we store the MTF pointer only for the first element of each 
MTF list (this is the rank 0 symbol according to the current context). It turns out that 
this new variant is as fast as the previous one and requires very little additional memory 
compared to the algorithm without MTF pointers (see Table 1). 

We have called the variant which uses only rank 0 pointers algorithm Sr_sl and we 
have tested it using files of the Canterbury corpus. The results are reported in Section 4. 

3.2 Ternary Tree Based Algorithms 

We have considered ternary trees as an alternative data structure for maintaining the 
set of contexts (the c-length substrings of the input string). We use a straightforward 
modification^ of the algorithms described in [2]. The cost of the f ind/insert operation 
on ternary trees depends on the order in which the strings are inserted and is difficult to 
express analytically. In practice, the results in [2] show that this is a very efficient data 
structure. 

The nodes of the ternary tree corresponding to c-length substrings have a pointer 
to a MTF list which is handled in the same way as in Section 3.1. The use of ternary 
trees instead of skip lists yields a substantial difference in the strategy for coding “new” 
symbols. Using ternary trees we no longer have an immediate access to the contexts 
which are adjacent (in lexicographic order) to the current context w. However, we can 
still access contexts which are “close” to w. Our idea consists in reversing each context 
before inserting it in the ternary tree (that is, if tc = /Ji • • • pc we start from the root 
searching the node corresponding to pc, then we search for pc-i and so on). In addition, 
each time we find/insert a context w we maintain the path = (uq, ni, . . . , nt) 
going from no (the root) to Uk (the node corresponding to w). If the incoming symbol 
a is not in the MTF list of w, we search it in the MTF lists of the contexts which can be 
found in the subtree rooted at Uk-i (excluding rik). If a is not found there, we search 
it in the MTF lists of the contexts which can be found in the subtree rooted at nk -2 
(excluding Uk-i), and so on. 

MTF pointers can be also used in conjunction with ternary trees (MTF pointers 
now point to tree nodes corresponding to c-length substring). Again, we get the best 
performance by storing only the rank 0 pointers, that is the MTF pointers corresponding 

^ Bentley and Sedgewik consider null terminated strings of different lengths, whereas we need 
to store arbitrary strings of length c. 





# f ind/insert 


Memory 


Time 


No MTF pointers 


1,383,612 


3.33 


29.07 


All MTF pointers 


216,890 


6.33 


23.13 


Rank 0 pointers 


614,603 


3.77 


22.03 



Table 1. Number of find/insert operations, memory usage (in megabytes) and mnning time 
(in seconds) for three different schemes of MTF pointers usage. The input file was texbookJex 
(1,383,616 bytes) with context size c = 4. 
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to the first element of each MTF list. A new difficulty unfortunately arises when we 
use these pointers. Suppose that the symbol a is “new” for the context w. Our strategy 
requires that we search a in the MTF lists of context which are “close” to w. However, 
if the node corresponding to w has been obtained following an MTF pointer, rather than 
performing a tree search, we do not have the path n\, . . . , Uk) which we 

ordinarily use to find contexts similar to w. Initially, we solved this problem adding, 
to each node of the tree, a pointer to its parent node so that the path could be 
reconstructed. However, since this causes a noticeable increase of memory usage, we 
decided to remove this pointer and to execute an additional tree search in the unfortunate 
case that a “new” symbol appears after a MTF pointers has been used. Profiling shows 
that this additional search are seldom necessary and that they do not affect significantly 
the running time. 

The symbol ranking algorithm based on ternary trees and rank 0 pointers has been 
tested in Section 4 under the name Sr_tt. 

3.3 Improved Algorithms 

Profiling shows that when we use skip lists or ternary trees combined with MTF pointers 
the most expensive operation of symbol ranking algorithms is the coding of “new” 
symbols. For most files, such symbols appear rather infrequently, however, since their 
coding requires the scanning of many MTF lists, their effect is noticeable. In this section 
we describe three new algorithms designed to cut down the cost of coding “new” symbols. 

In the first algorithm we set an upper limit to the amount of search we perform for 
each “new” symbol. The algorithm receives an input parameter m, when a “new” symbol 
a is not among the first m symbols of the candidate next symbol list the algorithm outputs 
a new_symbol code followed by a. Clearly, this strategy reduces the running time at 
the expense of compression, and we must face the choice of an appropriate parameter 
m. After a few experiments we found that m = 32 represents a good compromise 
between speed and compression. We have applied this technique to Sr_tt and the resulting 
algorithm has been tested in Section 4 under the name Sr_tt_32. 

We have also implemented a symbol ranking algorithm which closely follows the 
framework described in Section 2. In this algorithm “new” symbols are encoded using 
contexts of size c — 1 , c — 2 , . . . , 0 . To this end we maintain a MTF list for every context of 
size less or equal to c. If the symbol a has never appeared after the context w = • • • /?c, 

we search it in the MTF list of the context /?2 • • • pc, and so on up to the MTF list of 
the empty context (note that these “lower order” contexts are precisely the suffixes of 
w). Since the algorithm examines at most c + 1 lists the code of “new” symbols is 
significantly faster. However, we now have the problem of maintaining the MTF lists 
of the smaller contexts. Ideally, we should update these lists at each step, which means 
c + 1 updates for input symbol. However, we found that updating only the lists which has 
been actually searched has a much better compression/time trade-off (in other words, 
if a is found in the MTF list of the context p 2 ■ ■ ■ Pc, we do not update the list of the 
contexts of size 0, 1, • • • , c — 2. This technique is commonly used in PPM compression 
algorithms). 

The MTF lists of the “lower order” contexts can be easily accessed if we store the 
c-length contexts using a ternary tree. In fact, every node of the tree corresponds to a 
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string of length 1 < ^ < c, and the suffixes of a string w correspond to nodes which 
are in the path from the root to the node corresponding to w. When we find/insert 
the current context w, we keep track of these nodes so that if the incoming symbol turns 
out to be “new” we already have the relevant MTF lists at hand. We have implemented 
the above algorithm, with the addition of the rank 0 MTF pointers, and called it Sr_tt_fat 
(because of the extra memory it requires). We have tested it using files of the Canterbury 
corpus and the results are reported in Section 4. 

To overcome the drawback of the extra memory required by Sr_tt_fat, we have 
devised a simple variant which is more space economical. In this variant we maintain 
MTF lists only for contexts of size 0, 1 and 2 (in addition to the MTF lists for the full 
c-length contexts). The elimination of the “middle-level” MTF lists clearly results in a 
saving of space. In addition, we can expect that the coding of “new” symbols should 
take less time since we search them directly in MTF lists of lower order in which they 
are more likely to be found. This variant can be implemented using either ternary trees 
or skip lists (since there is a small number of lower order MTF lists they are accessed 
directly by table look-up rather than through the data structure used for the contexts as 
in algorithm Sr_tt_fat). We decided to use ternary trees with an additional variant which 
further reduces the cost of f ind/insert operations. Since we no longer need to access 
the MTF lists of contexts which are similar to the current one, in the ternary tree we 
insert each context in its natural order (that is, if the context is rc = /?i • • • /?c we start 
from the root searching the node corresponding to pi, then we search for /?2 and so 
on). At each leaf, in addition to the MTF list and to the rank 0 MTF pointer, we store 
a suffix pointer. The suffix pointer goes from the leaf corresponding to /?i • • • /?c to the 
(c — 1) -level node corresponding to /?2 • • • /?(,. Since the context following f3\ - ■■ pc will 
be of the form /?2 • • • pcd, the suffix pointer enables us to skip c — 1 “levels” in the next 
search in the ternary tree. Note that, differently from the MTF pointers, once the suffix 
pointer is established it never changes. Since this algorithm uses the contexts in their 
natural order we have called it Sr_tt_nat. Its performance on the files of the Canterbury 
corpus are reported in Section 4. 



4 Experimental Results 

Since our main interest is in the design of on-line symbol ranking algorithms which are 
time-efficient, we have tested our algorithms using the five largest files of the Canterbury 
corpus [1] (for small files the start-up overheads can dominate the compression time). 
In the full paper we will report the results for the whole Canterbury corpus in order to 
better assess the compression performance of our algorithms on different kinds of input 
files. The files considered here have the following characteristics, 
fax (513,216 bytes). Black and white bitmap of an image belonging to the CCITT test 
set. 

excl (1,029,744 bytes). Excel spreadsheet. 

ecol (4,638,690 bytes). Complete genome of the E. Coli bacterium (a long sequence 
of A, C, T, G characters for the rest of us), 
bibl (4,047,392 bytes). The King James version of the Bible, 
wrld (2,473,400 bytes). The 1992 CIA world fact book. 
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Tables 2, 3, 4, report the performance of the algorithms deseribed in the previous 
seetion. For a comparison we report data also for the algorithms gzip (with option -9 
for maximum compression) and bzip2 (with option -1 for maximum speed, and option 
-9 for maximum compression). 



Files: 


fax 


excl 


ecol 


bibl 


wrld 


Sr_sl 


c = A 


1.17 


0.85 


2.02 


2.22 


2.34 


Sr_sl 


c= 5 


1.16 


0.82 


2.02 


2.11 


2.13 


Sr_sl 


c = 6 


1.16 


1.14 


2.02 


2.05 


2.04 


Sr_tt 


c = A 


1.16 


0.82 


2.02 


2.21 


2.33 


Sr_tt 


c= 5 


1.15 


0.78 


2.02 


2.10 


2.12 


Sr_tt 


c = 6 


1.15 


0.98 


2.02 


2.04 


2.02 


Sr_tt_32 


c = A 


1.21 


2.00 


2.02 


2.22 


2.39 


Sr_tt_32 


c= 5 


1.20 


1.98 


2.02 


2.12 


2.18 


Sr_tt_32 


c = 6 


1.19 


1.99 


2.02 


2.06 


2.08 


Sr_tt_fat 


c = A 


1.17 


0.81 


2.02 


2.22 


2.35 


Sr_tt_fat 


c= 5 


1.16 


0.77 


2.02 


2.12 


2.14 


Sr_tt_fat 


c = 6 


1.17 


0.83 


2.02 


2.07 


2.05 


Sr_tt_nat 


c = A 


1.16 


0.81 


2.02 


2.23 


2.36 


Sr_tt_nat 


c= 5 


1.15 


0.78 


2.02 


2.14 


2.19 


Sr_tt_nat 


c = 6 


1.14 


0.84 


2.02 


2.13 


2.17 


gzip -9 


0.82 


1.63 


2.24 


2.33 


2.33 


bzip2 -1 


0.78 


0.96 


2.17 


1.98 


2.17 


bzip2 -9 


0.78 


1.01 


2.16 


1.67 


1.58 



Table 2. Compression in bits per symbol (output bits over input bytes). A smaller number denotes 
better performance. 



Due to the lack of space we cannot comment at length on the performance of the single 
algorithms. However, we ean elearly see that the strueture of the input file infiuenees 
not only the compression ratio but also the running time of the algorithms. Consider 
for example the file excl. All algorithms, with the exception of Sr_tt_32, achieve a very 
good compression (less than 1 bit per symbol) but the running times for gzip and all 
algorithms using ternary trees are unusually high (note that the best compression is 
achieved by Sr_tt_fat). The results for ecol are also worth commenting. Since the file 
contains only four distinct symbols, one would expect a compression of at least two bits 
for input symbol. Our algorithms come very close to this value, but this is not true for 
either gzip (which also has a high running time) or bzip2. Finally, it must be noted that 
bzip2 preprocesses the input file using run-length encoding. This technique affects the 
compression ratio of the files containing long runs of identical symbols (fax and excl in 
our test set). 

Overall, we can see that bzip2 achieves the best compression/speed trade-off. Its 
running time is also quite “stable”, that is, it is little influenced by the structure of the 
input. Our symbol ranking algorithms usually compress better than gzip and are on a par 
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Files: 


fax 


excl 


ecol 


bibl 


wrld 


Sr_sl 


c = 4 


15.09 


26.83 


4.41 


8.02 


12.84 


Sr_sl 


c = 5 


19.80 


33.77 


5.45 


11.04 


18.53 


Sr_sl 


c = 6 


22.15 


43.53 


7.13 


13.75 


23.90 


Sr_tt 


c = 4 


16.05 


228.16 


3.93 


5.43 


10.71 


Sr_tt 


c = 5 


23.12 


310.57 


4.39 


7.19 


16.39 


Sr_tt 


c = 6 


29.43 


395.44 


4.79 


10.32 


23.59 


Sr_tt_32 


c = 4 


7.01 


42.02 


4.06 


5.19 


7.99 


Sr_tt_32 


c = 5 


9.03 


57.24 


4.44 


6.75 


11.26 


Sr_tt_32 


c = 6 


10.94 


71.70 


4.94 


9.34 


15.44 


SrTtJat 


c = 4 


4.28 


19.74 


4.64 


5.29 


6.29 


SrTtJat 


c = 5 


4.71 


29.24 


5.12 


5.93 


7.15 


SrTtJat 


c = 6 


5.25 


38.79 


5.70 


6.91 


8.44 


Sr_tt_nat 


c = 4 


3.75 


10.89 


3.66 


4.43 


5.24 


Sr_tt_nat 


c = 5 


4.07 


13.66 


3.85 


4.76 


5.73 


Sr_tt_nat 


c = 6 


4.28 


17.34 


4.03 


5.31 


6.60 


gzip -9 


6.43 


40.27 


27.72 


6.84 


3.71 


bzip2 -1 


2.03 


5.01 


5.47 


5.19 


5.31 


bzip2 -9 


2.03 


6.56 


6.54 


6.01 


9.84 



Table 3. Running time in milliseconds per input byte. 



Files: 


fax 


excl 


ecol 


bibl 


wrld 


Sr_sl 


c = 4 


2.45 


5.57 


0.01 


0.36 


1.56 


Sr_sl 


c = 5 


3.77 


9.17 


0.01 


1.18 


3.86 


Sr_sl 


c = 6 


4.36 


12.33 


0.04 


2.64 


6.15 


Sr_tt 


c = 4 


2.50 


5.61 


0.00 


0.32 


1.40 


Sr_tt 


c = 5 


4.36 


10.17 


0.01 


1.02 


3.57 


Sr_tt 


c = 6 


6.32 


16.14 


0.03 


2.50 


6.75 


Sr_tt_fat 


c = 4 


2.95 


6.67 


0.00 


0.35 


1.56 


Sr_tt_fat 


c = 5 


5.53 


12.98 


0.01 


1.16 


4.20 


Sr_tt_fat 


c = 6 


8.47 


21.37 


0.03 


2.97 


8.40 


Sr_tt_nat 


c = 4 


2.87 


6.42 


0.00 


0.36 


1.60 


Sr_tt_nat 


c = 5 


4.83 


11.29 


0.01 


1.14 


4.00 


Sr_tt_nat 


c = 6 


6.86 


17.60 


0.03 


2.78 


7.41 


bzip2 -1 


2.16 


1.08 


0.24 


0.27 


0.45 


bzip2 -9 


7.80 


6.52 


1.45 


1.66 


2.71 



Table 4. Memory usage per input byte. The memory usage of Sr_tt_32 is the same of Sr_tt. 



with bzip2 -1 The improved algorithms described in Section 3.3 are usually faster than 
gzip, and are as fast as bzip2 for the text files (bibl and wrld) and the genome sequence 
(ecol). 
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5 Conclusions 

The concept of symbol ranking is an important one in the field of data compression. The 
off-line symbol ranking algorithms (such as bzip and szip) constitute the state of the 
art in lossless data compression. In this paper we have shown that also on-line symbol 
ranking algorithms can have a very good compression/speed trade-off. Since symbol 
ranking algorithms are still in their infancy we can expect further advancements in the 
future. 

More in general, our feeling is that data compression has recently seen the deve- 
lopments of new powerful compression techniques (see [13] for a recent review) but in 
many cases new ideas have been implemented with the wrong tools. We believe that the 
data structures described in this paper, as well as others well known in the algorithmic 
community, can be used to design more efficient compression algorithms. 
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Abstract. The list update problem, a well-studied problem in dynamic data struc- 
tures, can be described abstractly as a metrical task system. In this paper, we prove 
that a generic metrical task system algorithm, called the work function algorithm, 
has constant competitive ratio for list update. In the process, we present a new for- 
mulation of the well-known “list factoring” technique in terms of a partial order 
on the elements of the list. This approach leads to a new simple proof that a large 
class of online algorithms, including Move-To-Front, is (2 — l/fc)-competitive. 



1 Introduction 

I. 1 Motivation 

The list accessing or list update problem is one of the most well-studied problems in 
competitive analysis [1],[2],[3],[4],[5]. The problem consists of maintaining a set S of 
items in an unsorted linked list, as a data structure for implementation of a dictionary. 
The data structure must support three types of requests: ACCESS(x), INSERT(x) and 
DELETE(x), where x is the name, or “key”, of an item stored in the list. We associate a 
cost with each of these operations as follows: accessing or deleting the i-th item on the 
list costs i; inserting a new item costs j + 1 where j is the number of items currently 
on the list before insertion. We also allow the list to be reorganized, at a cost measured 
in terms of the minimum number of transpositions of consecutive items needed for the 
reorganization. The standard model in the literature is that immediately after an access 
or an insertion, the requested item may be moved at no extra cost to a position closer to 
the front of the list. These exchanges are called free exchanges. Intuitively, using free 
exchanges, the algorithm can lower the cost on subsequent requests. In addition, at any 
time, two adjacent items in the list can be exchanged at a cost of 1 . These exchanges are 
called paid exchanges. 

The list update problem is to devise an algorithm for reorganizing the list, by per- 
forming free and/or paid exchanges, that minimizes search and reorganization costs. As 
usual, the algorithm will be evaluated in terms of its competitive ratio. 

Many deterministic online algorithms have been proposed for the list update problem. 
Of these, perhaps the most well-known is the Move-To-Front algorithm: after accessing 
an item, move it to the front of the list, without changing the relative order of the other 
items. Move-To-Front is known to be 2—j^ competitive, and this is best possible [2], [7] . 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 289-300, 1999. 

© Springer- Verlag Berlin Heidelberg 1999 
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The static list update problem (where the list starts out with k elements in it, and all 
requests are accesses) can also be considered within the metrical task system framework 
introduced by Borodin, Linial and Saks [8] . * Metrical task systems (MTS) are an abstract 
model for online computation that captures a wide variety of online problems (paging, 
list update and the k-server problem, to name a few) as special cases. A metrical task 
system is a system with n states, with a distance function d defined on the states: d{i,j ) 
is the distance between states i and j. The distances are assumed to form a metric. The 
MTS has a set T of allowable tasks; each task t e T isa vector (r(l), r(2), . . . , r(n)) 
where r(i) is the cost of processing task r in state i. An online algorithm is given a 
starting state and a sequence of tasks to be processed online, and must decide in which 
state to process each task. The goal of the algorithm is to minimize the total distance 
moved plus the total processing costs. 

The list update problem can be viewed as a metrical task system as follows. The 
states of the list update MTS are the kl possible orders the k elements in the list can be 
in. There are k tasks, one for each element x in the list; and T2,(7t), the cost of processing 
task in state tt, is simply the depth of x in the list tt. Finally, the distance between two 
states or permutations is just the number of inversions between the permutations. ^ 

One of the initial results about metrical task systems was that the work function 
algorithm (fVF A) has competitive ratio 2n — 1 for all MTS’s, where n is the number of 
states in the metrical task system [8]. It was also shown that this is best possible, in the 
sense that there exist metrical task systems for which no online algorithm can achieve 
a competitive ratio lower than 2n — 1. However, for many MTS’s the upper bound of 
2n — 1 is significantly higher than the best achievable competitive ratio. For example, 
for list update with k elements in the list, n = k\, but we have constant competitive 
algorithms. Another example is the /c-server problem on a finite metric space consisting 
of r points. For this problem, the metrical task system has n = (]j) states, but a recent 
celebrated result of Koutsoupias and Papadimitriou shows that in fact the very same 
work function algorithm is 2/c — 1 competitive for this problem [9], nearly matching the 
known lower bound of k on the competitive ratio [10]. 

Unfortunately, our community understands very little at this point about how to 
design competitive algorithms that achieve close to the best possible competitive ratio 
for broad classes of metrical task systems. Indeed, one of the most intriguing open 
questions in this area is: For what metrical task systems is the work function algorithm 
strongly competitive? ^ 

Burley and Irani have shown the existence of metrical task systems for which the 
work function algorithm is not strongly competitive [11]. However, these “bad” metrical 

' As with much of the work on list accessing, we focus in this paper on the static case, where 
there are no insertions and deletions. The results described can be appropriately extended to 
the dynamic case. 

^ In this formulation, “free exchanges” are treated as made at unit cost immediately before 
the item is referenced. Because the cost of these exchanges is precisely offset by the lower 
reference cost, this model is identical to the standard model. See [6], Theorem 1. We continue 
to use the term “paid exchanges” to describe specifically those exchanges not involving the 
next-referenced element. 

^ We say an algorithm is strongly competitive if its competitive ratio is within a constant factor 
of the best competitive ratio achievable. 
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task systems seem to be rather contrived, and it is widely believed that the work function 
algorithm is in fact strongly competitive for large classes of natural metrical task sy- 
stems. The desire to make progress towards answering this big question is the foremost 
motivation for the work described in this paper. We were specifically led to reconsider 
the list update problem when we observed the following curious fact (Proposition 5, 
Section 4): The Move-To-Front algorithm for list update is a work funetion algorithm. 

This observation was intriguing for two reasons. First because it raised the question 
of whether work function algorithms generally (including those with tie-breaking rules 
different from that used in Move-To-Front) are strongly competitive for list update. This 
would provide an example of a substantially different type of metrical task system for 
which the work function algorithm is strongly competitive than those considered in the 
past. 

The second and perhaps more exciting reason for studying work functions as they 
relate to list update is the tantalizing possibility that insight gained from that study 
could be helpful in the study of dynamic optimality for self-adjusting binary search 
trees [1],[12]. It is a long-standing open question whether or not there is a strongly 
competitive algorithm for dynamically rearranging a binary search tree using rotations, 
in response to a sequence of accesses. The similarity between Move-To-Front as an 
algorithm for dynamically rearranging linked lists, and the splay tree algorithm of Sleator 
and Tarjan [12] for dynamically rearranging binary search trees, long conjectured to 
be strongly competitive, is appealing. Our hope is that the use of work function-like 
algorithms might help to resolve this question for self-adjusting binary search trees. 



1.2 Results 

The main result of this paper is a proof that a class of work function algorithms is 0(1) 
competitive for the list update problem.^ Proving this theorem requires getting a handle 
on the work function values, the optimal offline costs of ending up in each state. This is 
tricky, as the offline problem is very poorly understood. At present it is even unknown 
whether the problem of computing the optimal cost of executing a request sequence is 
NP-hard. The fastest optimal off-line algorithm currently known runs in time 0{2^k\m), 
where k is the size of the list and m is the length of the request sequence [6]. 

Using the framework that we have developed for studying work functions and list 
update, we also present a new simple and illustrative proof that Move-To-Front and a 
large class of other online algorithms are (2 — 1/A:) -competitive. 

The rest of the paper is organized as follows. In Section 2, we present background 
material on work functions and on the work function algorithm. In Section 3, we present 
a formulation of the list update work functions in terms of a partial order on the elements 
of the list and use this formulation to prove that a large class of list update algorithms 
are (2 — 1/A:) -competitive. Finally, in Section 4 we present our main result, that work 
function algorithms are strongly competitive for list update. The proof relies on an 
intricate construction; a number of technical details are omitted for lack of space. 

The proof does not achieve the best possible competitive ratio of 2. 
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2 Background 

We begin with background material on work functions and work function algorithms. 

Consider an arbitrary metrical task system, with states s € S', and tasks r € T. 
We define the work function ufs) for any state s and index t to be the lowest cost of 
satisfying the first t requests of a and endingup in state s [13], [8]. Suppose that cjt+i, the 
(t + 1 ) st request in a, is the task r. Because the states and task costs are time- independent, 
the work functions can be calculated through a dynamic programming formulation (this 
can be taken as the definition): 

= min (wt(s) -b r(s) -f d{s, s')) (1) 

S 

where r(s) is the cost of executing task r in state s. We note three elementary identities, 
which hold at all times t, and for all states s and s': (1) (Ut+i(s) > u)t{s), (2) o;t+i(s) < 
^t(s) + cTt+i(s), and (3) u)t{s) < + d{s, s'). 

The Work Function Algorithm (fVF A), [13], [8], defined on an arbitrary metrical 
task system, is the following: when in state St, given a request Ct+i = r, service r in 
the state St+i such that 



st+i = ar grain s{ujt+i{s) + d{st,s)) (2) 

where the minimum is taken over states s that are fundamental at time t + 1, i.e., they 
must satisfy o;t+i(s) = Wt(s) + r(s). Combining these two equations implies that St+i 
is chosen so that 



st+i = ar grain s{ujt{s) + r(s) + d(st, s)). (3) 

This algorithm can be viewed as a compromise between two very natural algorithms: 
(1) A greedy algorithm which tries to minimize the cost spent on the current step, 
i.e., services the {t + l)st request r in a state s that minimizes d{st, s) + t(s). (2) A 
retrospective algorithm, which tries to match the optimal offline algorithm, i.e., chooses 
to service the {t + l)st request r in a state s that minimizes o;t+i(s). 

Each of these algorithms is known to be noncompetitive for many natural problems. 
WFA combines these approaches and, interestingly, this results in an algorithm which 
is known to be strongly competitive for a number of problems for which both the greedy 
and retrospective algorithms are not competitive.^ 

A variant of this work function algorithm, which weTl call WFA' , is to service the 
request r in the state St+i such that 

st+i = ar grains (^t+i(s) + 't(s) + d{st,s)) . (4) 

The difference between WFA and WFA' is in the subscript of the work function. We 
actually feel that WFA' is a slightly more natural algorithm, in light of the discussion 
above about combining a greedy approach and a retrospective approach. It is this latter 
work function algorithm WFA' that we will focus on in this paper. Our proof that WFA' 

^ Varying the relative weighting of the greedy and retrospeetive components of the work function 
algorithm was explored in [14]. 
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is 0(1) competitive for list update ean be extended to handle WF Aas well, though the 
proofs will be omitted in this extended abstraet.® 



3 A Different View on List Factoring 

A technique which has been used in the past to analyze list update algorithms is the list 
factoring technique, which reduces the eompetitive analysis of list accessing algorithms 
to lists of size two [3], [7], [15], [4], [16]. For example, this technique, in conjunction 
with phase partitioning, was used to prove that an algorithm called TimeStamp is 2- 
competitive [4], [16]. In this seetion, we repeat the development of this technique, but 
present it in a somewhat different way, in terms of a partial order on elements.^ This 
view leads us to a simple generalization of previous results and will assist us in our study 
of W FA'. 

Consider the metrical task system corresponding to a list of length two. In this case 
there are two lists, (a, b) (a in front of 6) and {b, a) (b in front of a), and the distance 
between these two states is 1. Since LJt{{a,b)) — 1 < u!t{{b, a)) < uit{{a,b)) + l, for any 
t, the work functions at any time can be characterized by one of three distinct properties: 

• Wt((a, b)) < LOt{{b, a)), which we denote a >- b, 

• Wt((a, 6)) = u!t{{b, a)), which we denote a ^ b,OT 

• ujt((a, b)) > cjt((b, a)), which we denote a b. 

It is easy to verify directly from Equation 1 the transitions between these three properties 

as a result of references in the string a. 




Fig. 1. The three-state DFA: the state a >- b corresponds to the case u!t{{a, b)) = uJt{{b, a)) — 1, 
the state a b corresponds to the case o’* ((a, b)) = uJt((b, a)), and the state a -< & corresponds 
to the case tut ((a, b)) = uJt((b, a)) + 1 



The resulting three-state DFA shown in Figure 1 can be used to completely eharac- 
terize the work functions, the optimal offline list configuration, and the optimal cost to 
service a request sequence o. The start state of the DFA is determined by the initial order 

® In addition, it is easy to show that prior results which hold for IFF A also hold for ITf A' . For 
example, W FA' is 2n — 1 competitive for any metrical task system with n states, and W FA' 
is 2fc — 1 competitive for the k-server problem. 

^ This partial order has apparently been considered by Albers, von Stengel and Werchner in the 
context of randomized list update, and was used as a basis for an optimal randomized online 
algorithm for lists of length 4. [17] 
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of the elements in the list: it is a >- 6 if the initial list is (a, b) and a b if the initial list 
is {b, a) . Each successive request in a results in a change of state in accordance with the 
transitions of the DFA, reflecting the work function values after serving that request. It 
is easily verified that the optimal cost of satisfying a sequence a is precisely the number 
of references in the sequence plus the number of transitions into the middle DFA-state, 
i.e., the number of times a is referenced in the state a ~<b plus the number of times b is 
referenced in the state a >- b. The corresponding optimal offline strategy is: immediately 
before two or more references in a row to the same element, move that element to the 
front of the list. 

Now consider list update for a list of length k. The cost of an optimal sequence 
can be written as the sum of (i) the distances between successive states (the number 
of exchanges performed)* and (ii) the reference costs at each state. The standard list 
factoring approach is to describe the cost of any optimal sequence for satisfying a by 
decomposing it into |(t| plus the sum over all pairs (a, 6) of (i) the exchanges between 
a and b, and (ii) the pairwise incremental costs, i.e. the cost attributed to a when b is 
referenced but a is in front of b in the list and the cost attributed to b when a is referenced 
but b is in front of a in the list. But for any pair (a, 6), the pairwise transpositions and 
the pairwise cost of references is a (perhaps suboptimal) solution to the list of length 
two problem for the subsequence of a consisting of references only to a and b. Thus |(t| 
plus the sum of the costs of the optimal length-two solutions over all pairs a, 6 is a lower 
bound for the optimal cost of satisfying a!^ 

3.1 The Partial Order 



Xl 



X2 




Xl X3 



X2 Xl 



X2 



0 





X2 

X3 X2 



X3 



Xl 




X3 



Xl 



Fig. 2. Illustration of the evolution of the partial order on three elements in response to the request 
sequence a = X3,X2,X3,X2 assuming the initial list is ordered xi,X2, X3 from front to back. As 
usual, a directed edge from a to b indicates that a >- bin the partial order, whereas the absence of 
an edge indicates that a ^ b 



We are thus led to consider the collection of k{k — l)/2 pairwise three-state DFAs, 
one for each pair a, 6 of elements in the list of length k . Consider the result of executing all 

* Recall that in our model we charge for each exchange, whether “paid” or “free”; the cost of 
free exchanges in the standard model precisely corresponds in our model to a reduced reference 
cost on the immediately following reference. See [ 6 ], Theorem 1 . 

^ The lower bound is not tight; for a list of length five, initialized abode, the sequence a = 
ebddcceacde provides one counterexample. 
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these DFAs in parallel in response to requests in a, starting from the states corresponding 
to the initial list. Figure 2 shows an example. It is easy to verify that the resulting states 
define a valid partial order on the k elements of the list. (For example, Move-To-Front 
is always consistent with this partial order.) 

Define by Gt (respectively It) the number of elements greater than (respectively 
incomparable to) at in this partial order immediately prior to its reference at time t. By 
the discussion above, the optimal cost of servicing a request sequence a of length n and 
ending up in any state s is bounded below as follows: o;„ (s) > n+Y.i<t<n^t- 

An easy counting argument shows that Yht ■ XI th® cumulative 

number of transitions into middle states a ~ 6 of the DFA’s, ^ It is the cumulative 
number of transitions out of middle states, and the starting state is always either a >- b 
or a b (not a middle state). Since you can’t transition out of a middle state until you 
have transitioned into one, we have shown that 

Lemma 1. At all times T, '^t<T — '^t<T 

This leads to a new, very simple proof that a collection of algorithms already known 
to be competitive, including Move-To-Front, TimeStamp, and many others, are all 2 — 1/A: 
competitive. 

Theorem 2. Any online list update algorithm that performs only free exchanges and 
maintains the invariant that the list order is consistent with the partial order is (2 — 1/A:)- 
competitive. 

Proof. Any online algorithm A that maintains a list order consistent with the partial order 
and performs no paid exchanges has a total cost A(cj) satisfying j4((t) < n+^flt+Gt), 
where |ct| = n. 

By Lemma 1 and the fact that OPT {a) < kn, we can conclude that A{a) < 

n + 2Y.^Gt<{2-l/k)OPT{a). □ 

The following corollary is also immediate from Theorem 2, since both Move-To-Front 
and TimeStamp [4] maintain a list consistent with the partial order. 

Corollary 3. Move-To-Front and TimeStamp are (2 — l/k)-competitive. 



4 On the Performance of Work Function Algorithms 

4.1 Preliminaries 

We begin with some definitions and facts. In what follows, the (t + l)st request at+i is 
X, and the task cost Tx(s) is denoted x(s). We also define the binary relation on two 
states, s tx s', if s' can be derived from s by moving x forward (including s = s') while 
leaving the relative positions of other elements undisturbed. 

Ran El-Yaniv has recently presented an different family of algorithms, all of which are 2 — 1/fc 
competitive [5]. All algorithms in this family maintain lists consistent with the partial order and 
incur only free exchanges, and hence can also be proved 2 — 1 /fc-competitive using this result. 
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We say that the state s is wfa-eligible at time t if it minimizes the expression (s) + 

a:(s) + d{st, s). We say that the state s is fundamental at time t iff tiJt+i(s) = uJt{s) + 
a:(s)." 

We omit the easy proofs of the following facts. 

Proposition 4. Let s be an arbitrary state. Then: 

1. uJt+i (s) = Wt+i (/) + d{f, s) for some state / that is fundamental at time t. (The 
state s is derived from some fundamental state.) 

2. Suppose a;t+i(s) = wt+i(/) + d{f,s) where f is a fundamental state. Then 
x{f) < a;(s). (The depth of x in the fundamental state f is no greater than the depth of 
X in s.) 

3. If s is wfa-eligible at time t, and a;t+i(s) = wt+i(/) + d(/, s), where / is a 
fundamental state at time t, then f is wfa-eligible at time t and x{f) = x(s). (The 
fundamental state f is also wfa-eligible if s is.) 

4. If s tx s', then Wt+i (s) > Wt+i (s')- (Moving x forward cannot increase the work 
function.) 

We can now show (proof omitted in this extended abstract) that (a) there always exists a 
wfa-eligible state (the MTF state is one such) that requires no paid exchanges, and (b) 
that with such a restriction WF A' is equivalent to an algorithm we call Move-To-Min-u 
(Mtmw) defined as follows: On a reference to x, move x forward to a state with lowest 
work function value immediately after the reference. In other words, if St is the state 
the algorithm is in immediately before servicing the t + 1-st request Ct+i, then Mtmw 
moves to a state St+i such that St+i = argmiUs ■. (s) and satisfies ivt+i there. 

Summarizing: 

Proposition 5. M tmw is a special case ofWFA' and Move-To-Front is a special case 
of Mtmw. 

4.2 WFA' Is 0(1) Competitive For List Update. 

The technically challenging part of the proof is the following lemma. 

Lemma 6. Consider a = a\,x,a 2 ,x, where in U 2 there are no references to x, and 
|cr| = t. Let S be any fundamental state at the final time step t. 

Let N be the set of elements that are not referenced in a 2 that are in front of x in S, 
and let R be the set of elements that are referenced in U 2 . Also, let S be S with x moved 
forward just in front of the element in N closest to the front of the list. Then 

wt{S) <wt{S) + \R\-\N\. (5) 



Proof, (sketch) 

Suppose O is an optimal sequence ending in S after satisfying a\^xya 2 ,x, so that 
the cost of O is the work function value wfiS). Let T denote the state in which O 
satisfies the penultimate reference to x. We note that, at the point immediately prior 
to the penultimate reference to x (at time k, say), the cost of O is u>k-i{T). In this 

" These definitions, and the first three facts, are valid for all metrical task systems. 
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construction, we modify O between T and S to so as to obtain a state S, with S' 

and LOt{S)<ut{S) - |A^| + |7^|. 

Let N denote the total number of elements not refereneed between Uk = x and 
(Jt = X. (This set specifically includes x, and is potentially much larger than N, which 
is the number of such elements in front of x in S.) Order these non-referenced elements 
Pi , . . . , pjv in the order they occur in the state T. 

The construction of the lower-eost state S proceeds in three stages: 

1 . Rearrange the respective order of the non-refereneed elements within T to obtain 
some state T'. In T', a; will oeeupy the location of the front-most non-refereneed element 
in T. All other non-referenced elements p in T' will satisfy the non-decreasing depth 
property, that p{T) < p{T')}^ All referenced elements remain at their original depths. 
(The specific definition of the state T' will emerge from the rest of the eonstruction; the 
cost can be bounded by using only the non-decreasing depth property.) Evaluate ak = x 
in this state T'. 

Denoting by I[X, Y] the number of interchanges of non-referenced elements other 
than X between states X and Y , it is straightforward to show (using the non-deereasing 
depth property) that x{T') + d{T, T') < x{T) + R + I\T, T'] . 

2. Considering O as a sequence of transpositions and references transforming T to 
S, O :T S, apply a suitably chosen subsequence O', including all of the referenees 
and many of the transpositions, of O. This subsequence O' will transform T' to a state 
S' . In this state S', (i) eaeh referenced element has the same depth as it does in S; (ii) 
the element x occupies the position of the front-most non-refereneed element in S', and 
(iii) all other non-referenced elements in S' are in their same respective pairwise order 
as in S'. Evaluate x in S'. 

In Proposition 7, we show that a transformation from T' with the non-decreasing 
depth property to S' as so defined can be achieved by a suitably chosen subsequence of 
O. We also show that J[T, T'] + \0'\ < |0|, where |0| denotes the cost of the sequenee 
O. 

3. Transform S' to the state S, where S is defined by (i) S S, and (ii) the depth 
of a; in S is the depth of the front-most non-referenced element in S (which is also its 
depth in S'). 

It is straightforward to show that x{S') + d{S', S) + IWI < x{S). The result now 
follows by comparing the cost of the modified sequence from and after u>k-i{T) to the 
cost of the original sequence. 

We now address the most intrieate part of the construction: 

Proposition 7. Suppose S' is derived from S, such that (i) all referenced elements p 
have p{S') = p{S), (ii) x occupies in S' the position of the front-most non-referenced 
element in S, and (iii) all other non-referenced elements are in their same respective 
order in S' as in S. Then there is a T' -with the non-decreasing depth property, and a 
subsequence O' C O, such that (i) 0'{T') = S', and (ii) the cost of O is at least the 
cost of O' plus the cost of interchanges I\T' , T] of non-referenced elements (other than 
x) necessary to derive T' from T. 

Recall that we denote the depth of an element p in the state X by p{X). 
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Proof, (sketch) As above, we denote by Pi the non-referenced element oeeupying the 
i'th non-referenced position in T. For convenience, let z denote the location of a; as a 
non-referenced element in T, 

We proceed by iteratively constructing T' from the end of the list, beginning with 
The location of referenced elements remains fixed throughout the construc- 
tion. As a result, we consider only the N positions of non-referenced elements. For 
convenience, we describe the iteration as proceeding from i = N to i = 1. (The “base 
case” is denoted by “i = A" + 1”.) At each step, then, we define a map Oi : T/ — ^ S' . 
The non-decreasing depth property is maintained for the elements (other than x) in 

(S") = T/ that occupy the locations i through N in T/. We show that any necessary 
interchanges of elements as we proceed from T/ to correspond to transpositions in 

For each pair of elements p,qfx?A locations i and below in T/, we can determine 
whether these two elements are in the same or in the opposite order in T. We denote by 
li [T/, T] the number of pairwise inversions of such elements (other than x). We denote 
by |0| (respectively, |Oi|) the number of transpositions in the sequence O (respectively, 
Oi). 

Formally, we can prove by induction (details omitted in this extended abstract) that 
for each i: 

1. Oi{T[) = S' (and : S' ^ T)) 

2. Oi Q O in the sense ofa subsequence of transpositions, and |0| > m+f[T',T] 

(all swaps and inversions are accounted for) 

3. x{T() < Pi{T) (x is no deeper than position i) 

4. fp f X with p{T) > pi{T),p{T') > p{T) (all elements other than x at position i 

or below in T have the non-decreasing depth property) 

5. Sp,q f xwithp{T),q{T) < pfT): 

a) p{S) < x{S) ^ p{T() f p{T), and p{S) > x{S) ^ p{T() = p{T) 

b) p{T) = q{Ti) p{S) > q{S) 

We define T' = T\, and note that the non-increasing depth property is satisfied for 
all Pi f X. We define O' = 0\, and note all of the inversions between non-referenced 
elements inT' have been accounted for, i.e., J[T, T'] + 10'| < |0|. Finally, we repeat that 
because the only transpositions removed from O are between non-referenced elements, 
the depths, and thus the reference costs, of all referenced elements remains identical 
between O and O' . □ 

We obtain the following corollary to Lemma 6. 

Corollary 8. Consider a request sequenee a where the last request (the t-th request in 
a) is to X. If s is wfa-eligible after executing a, then the depth of x in s is at most 2\R\, 
where R is the set of elements that have been refereneed since the penultimate reference 
to X. 

Proof. Let / be a fundamental state such that Wt+i (s) = <^t+i (/) + s). By Proposi- 

tion 4, / is also wfa-eligible and a;(/) = a:(s). Suppose a;(s) > 2|i?|. Thena:(/) > 2|i?|. 
Elements in front of a; in / either have or have not been referenced since the penultimate 

We use the terms “position” and “location” interehangeably to refer to the respective positions 
of non-referenced elements in T. 
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reference to x; so a;(/) > 2 |i?| implies lAi"! > |i?|, where A?" is the set of elements in front 
of a; in / that have not been referenced since the penultimate reference to x. Then by 
Lemma 6 there exists / with u)t{f) < and / f, contradicting the assumption 

that / is wfa-eligible. □ 

Finally, we use the lemma to obtain the main theorem. 

Theorem 9. WFA' is 0(1) competitive. 

Proof, {sketch) We consider only Mtmw here. Consider an arbitrary element x and 
let (7 = (To, X, (Ti, X, (72, a;, where in (Ti and (72 there are no references to x. Then by 
Lemma 6 the Mtmw state, immediately before the final reference to x, is at depth at 
most 2ri + X 2 , where ri is the number of distinct elements referenced in cti and X 2 is 
the number of distinct elements referenced in 02 , not referenced in a\, that are moved 
in front of x at some point during the subsequence (72. 

As usual, let G be the number of elements greater than x immediately before its final 
reference and let I be the number of elements incomparable to x immediately before its 
final reference. In addition, let L(0) be the number of elements less than x immediately 
before its final reference that were incomparable to x immediately before the penultimate 
reference to x. We therefore have ri + X 2 < G + / + T(0). 

A simple counting argument similar to the proof of Lemma 1 proves that L(0)t < 

Gf. Taken together with Lemma 1, we obtain the theorem. □ 

A fairly easy extension of the ideas presented here can be used to show that both 
WFA' and WFA are 0(1) competitive, even when the algorithms are allowed to 
perform paid exchanges. 

It is fairly clear that our analyses of these algorithms are not tight. However, it is 
easy to show that WFA, even without paid exchanges, is no better than 3 -competitive. 
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Abstract. In the fc-Server Problem we wish to minimize, in an online fashion, 
the movement cost of k servers in response to a sequence of requests. The request 
issued at each step is specified by a point r in a given metric space M. To serve 
this request, one of the k servers must move to r. (We assume that k > 2.) 

It is known that if M has at least fc + 1 points then no online algorithm for the 
fc-Server Problem in M has competitive ratio smaller than fc. The best known 
upper bound on the competitive ratio in arbitrary metric spaces, by Koutsoupias 
and Papadimitriou [6], is 2fc — 1 . There is only a number of special cases for which 
fc-competitive algorithms are known: for fc = 2, when M is a tree, or when M 
has at most fc + 2 points. 

The main result of this paper is that the Work Function Algorithm is 3-competitive 
for the 3 -Server Problem in the Manhattan plane. As a corollary, we obtain a 4.243- 
competitive algorithm for 3 servers in the Euclidean plane. The best previously 
known competitive ratio for 3 servers in these spaces was 5. 



1 Introduction 

The k-Server Problem is defined as follows: we are given k mobile servers that reside 
in a metric space M. A sequence of requests is issued, where each request is specified 
by a point r € M . To “satisfy" this request, one of the servers must be moved to r, at a 
cost equal to the distance from its current location to r. An algorithm A for the fc-Server 
Problem decides which server should be moved at each step. A is said to be online if its 
decisions are made without the knowledge of future requests. Our goal is to minimize 
the total service cost. 

We define an online algorithm A to be C -competitive if the cost incurred by A to 
service each request sequence g is at most C times the optimal (offline) service cost for 
Q, plus possibly an additive constant independent of g. The competitive ratio of A is the 
smallest C for which A is C-competitive. 

The fc-Server Problem was introduced by Manasse, McGeoch and Sleator [8], who 
proved that no online algorithm can have a competitive ratio smaller than fc if a metric 
space has at least fc + 1 points, and they presented an algorithm for the 2-Server Problem 
which is 2-competitive, and thus optimal, for any metric space. They also proposed the 
k-Server Conjecture, stating that, for each fc > 3, there exists a fc-competitive algorithm 
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that works in all metric spaces. So far, this conjecture has been settled only in a number 
of special cases, including trees and spaces with at most k + 2 points [1,2,7]. Even some 
simple-looking special cases remain open, for example the 3-Server Problem on the 
circle, in the plane, or in 6-point spaces. 

Recent research on the /c-Server Conjecture has focussed on the Work Function 
Algorithm (WFA), as a possible candidate for a A: -competitive algorithm. WFA is a 
tantalizingly simple algorithm that, at each step, chooses a server so as to minimize 
the sum of two quantities: the movement cost at this step, and the optimal cost of the 
new configuration. (More formally, the latter quantity is the optimal cost of serving past 
requests and ending in that configuration.) Thus one can think of WFA as a combination 
of two greedy strategies: one that minimizes the cost of the current move, and one that 
chooses the best configuration to be in. 

Chrobak and Farmore [3,4] proved that WFA is 2-competitive for A: = 2. Their 
approach was based on a new technique, which involves introducing an algorithm- 
independent quantity called pseudocost that provides an upper bound on WFA’s cost, and 
on estimating the pseudocost instead of the actual cost of WFA. The pseudocost approach 
can also be effectively used to prove that WFA is competitive for other problems, for 
example Task Systems [4] . 

For A: > 3, a major breakthrough was achieved by Koutsoupias and Papadimitriou 
[5,6], who proved that WFA is {2k — 1) -competitive for k servers in arbitrary metric 
spaces. Their proof was based on the pseudocost method. 

The main result of this paper is that WFA is 3-competitive for 3 servers in the 
Manhattan plane. Our research builds on the work from [3,4,6]. The proof uses the 
pseudocost method. The main difficulty in estimating the pseudocost is in finding an 
appropriate potential function. We construct a potential function <P, and we formulate 
certain conditions on a metric space M under which # provides a certificate that WFA is 
3-competitive in M. Then we show that the Manhattan plane satisfies these conditions, 
and we conclude that WFA is 3-competitive in the Manhattan plane for 3 servers. Since 
the Euclidean metric can be approximated by the city-block metric, this also gives a 
3\/2-competitive algorithm for 3 servers in the Euclidean plane. 



2 Preliminaries 



Eet M be a metric space. For points x,y £ M, we write xy to denote the distance 
between x and y. Unordered A: -tuples of points in M will be called configurations, and 
they represent positions of our k servers. Configurations will be denoted by capital letters 
X,Y, . . . The configuration space is itself a metric space under the minimum-matching 
metric. We write XY to denote the minimum- matching distance between X and Y. For 
simplicity, we assume that the initial server configuration S° is fixed. Without loss of 
generality, we allow the algorithms to move any number of servers before or after each 
request, as long as between the times when two requests are issued, at least one of the 
servers visits the last request point. It is a simple exercise to verify that this additional 
freedom does not change the problem, but it makes the definitions associated with work 
functions easier to handle. 
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Work functions. Work functions provide information about the optimal cost of serving 
the past request sequence. For a request sequence q, by we denote the minimum 

eost of serving g and ending in eonfiguration X. We refer to u>g as the work function 
after g. We use notation u to denote any work function uig, for some request sequence 
g. Immediately from the definition of work funetions we get that the optimal eost to 
service g is opt{g) = minx <^g{X). 

For given g, we can compute Ug using simple dynamie programming. Initially, 
LjOe{X) = S°X, for each configuration X (e is the empty request sequence). For a non- 
empty request sequence g, if r is the last request in g, write g = or. Then uOg ean 
be computed recursively as cOg = w^Ar, where “A" is the update operator defined as 
follows: 

{u>Ar){X) = mm{Lj{Y) + YX} (1) 

Y Br 

Note that \lc{X) — (jj{Y)\ < XY for any work function u and any configurations X 
and Y. This inequality we call the Lipschitz property. Koutsoupias and Papadimitriou 
[5,6] proved that work functions also satisfy the following quasiconvexity property: 

u>{X)+ui{Y)> max min {ui{X - x + y) + ui{Y - y + x)} (2) 

x^X—Y y^Y — X 

The Lipschitz property and quasiconvexity will be used extensively in our caleulations. 
The Work Function Algorithm. We define the Work Function Algorithm (WFA) to be 
an algorithm whieh chooses its serviee of the request sequence g as follows: Suppose 
that WFA is in eonfiguration S, and that the current work function is uj. On request r, 
WFA chooses that x £ S which minimizes xr + u>Ar{S — x + r), and moves the server 
from a; to r. 

WFA can be seen as a “linear combination" of two greedy strategies: one that mini- 
mizes the cost xr in the given step, and one that chooses the optimal eonfiguration after 
r, that is, the eonfiguration S — x + r that minimizes u>Ar{S — x + r). Neither of these 
two greedy strategies is competitive. 

Since r £ S — x -£ r,wc. have uAr{S — x + r) = u>{S — x + r), so WFA can 
as well minimize xr + u{S — x + r). Yet another possible formulation is to move to a 
new eonfiguration S' that contains r and minimizes SS' + oj{S'). It is not hard to show 
(by induction on the number of requests) that this is equivalent to our formulation, since 
WFA will only move one server at a time. 

The pseudocost method. The pseudocost is a function that provides an upper bound on 
WFA’s cost. More aeeurately, the pseudoeost actually bounds the sum of WFA’s and the 
optimal costs. Since the pseudocost is algorithm-independent, it is much easier to deal 
with than the aetual cost of WFA. 

For any work function u) and r £ M, we consider the maximum increase of the work 
function if r is requested: 

Vr(cc) = max {u>Ar(X) — u){X)} 

X 

Suppose that g = r^ . . . r"", and let w* denote the work fimetion after requests rV . . r*, 
that is (u* = Xso Ar^ . . . r*. The pseudocost of g is defined as Vg = Y!t=i 
The lemma below establishes the relationship between the cost of WFA and the pseu- 
docost. 
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Lemma 1. [3,6] > co5Va(£’) + opt{g). 

According to the above lemma, in order to prove that WFA is C-competitive, it is 
sufficient to show that the pseudocost is {C + 1) -competitive. To prove the latter fact, 
we use a standard potential argument, formalized in the lemma below. 

Lemma 2. Let <L’uj,r € R be defined for each work function uj with the last request r. 
Suppose that satisfies the following properties: 

(OP) 4- (C + 1) min(u;) > 0 

(UP) If p = ujf\s then <P^^g + Vs{u>) < ‘Puj,r 

Then WFA is C -competitive on M. 

Conditions (OP) and (UP) are referred to as, the offset property and the update property, 
respectively. Function is called a potential function. 

Proof. We use the notation from the previous lemma. Let also <P^ = ‘Puipr*- The proof 
is by amortized summation: 

n 

t = l 
n 

< ^ 

t=i 

= q)0 _q)n 

<{C+ l)opt{Q) + <P°. 

In the last step we used the offset property (OP). <P° is independent of g. By Lemma 1, 
it follows that WFA is C-competitive. ■ 

A modified update property. Define the shadow of a; as 

uj{x) = max < xa — uj{A) > (3) 

UeA J 

A configuration A is called an {u>, x)-maximizer if A maximizes the right-hand side 
in (3), that is, ui{x) = ~ ‘^(^)- ^ request in uj, then u:{A) — 

uj{A — b + r) + br, for some b e A, and 

xa — u:{A) ~ xa — br — ui{A — b + r) 

aEA a^A 

< xa — u{A — b + r). 

a^A-b-\-r 

We conclude that, without loss of generality, an {u>, a; (-maximizer eontains the last 
request. 



Lemma 3. [6] Suppose that A is an (te, x)-maximizer. Then 
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(a) A is an {u)Ax, x)-maximizer. 

(b) A maximizes ujf\x{A) — u){ A). 

Proof. The proofs for (a) and (b) use the quasiconvexity property and are quite similar. 
We give the proof for (b), and refer the reader to [6] for the proof of (a). (See also [4].) 
To show (b), it is sufficient to prove that 

u>/\x{A) + u>{B) > u>{A) + ljAx{B) (4) 

for each configuration B. If x € B then co{B) = uAx{B) and (4) follows from 
ljoAx{A) > uj{A). Suppose x ^ B. Since ^ is a (w, a;) -maximizer, ax — u){A) > 
bx — uj{A — a+h) for all a e and b A. Then, using quasi-convexity, we have 

u!Ax{A) + ixi{B) = min {uj{A — a + x) + ax + 01 (B)} 

aEA 

> min min }ui(A — a + b) + ax + ui(B — b + x)} 

~ aeA b€B-A ^ ^ 

> min }ui(A) + bx + ui(B — b + x)} 

~ b€B-A ^ 

> u)(A) + u)Ax(B) 



and (4) follows. ■ 

We now introduce a modified update property, which we will use instead of the 
update property (UP) from Lemma 2. This will considerably simplify the calculations 
in the next section. 

Corollary 4. Let satisfy the offset property (OP) and have the form = Co (r) + 

r- Suppose that <P^ r satisfies the following modified update property 
(MUP)///x = loAs thenCb(s) 

Then WFA is 3-competitive in M. 

Proof. Let A be an (oo, s)-maximizer, and suppose that (MUP) holds. From Lemma 3, 
we have f(s) = “ P(^) ^^d Vs(ui) = fyA) — io(A). Then, 

'^n,s + Vs(o;) = sa — p(A) + + p(A) — 01 (A) 

a^A 

= ^ sa - uj(A) + 

aeA 

and, by Lemma 2, we conclude that WFA is 3-competitive. ■ 



3 The Potential Function 

In this section and later throughout the paper we assume that k = 3. We now introduce 
our potential function <P. We then show that <P satisfies the offset property and that, under 
certain conditions, it also satisfies the update property. 
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Main idea. Finding an appropriate potential funetion is the main diffieulty in proving 
eompetitiveness. One standard approaeh that applies to many online problems is to use 
the “lazy adversary" idea: Assume that the adversary is in some eonfiguration X, and 
ealculate the maximum cost of the algorithm on request sequences g <E X * . Clearly, the 
potential has to be at least as large as the obtained quantity, so that the algorithm has 
enough “savings" to pay for serving such sequences g. Then subtract (A: + 1) • tj{X), 
since the adversary will pay u> {X ) . From the analysis of spaces with k + 1 points, it can 
be seen that we also need to add another quantity equal to the sum of all distances in X. 
Thus obtained value is referred to as the lazy potential. 

As a better illustration, start with A: = 2. In that case, X = {x,y}, and, if we 
additionally assume that x is the last request, we only have one lazy request sequence 
in X to consider: g = (xy)* . Thus the potential will be the maximum, over all choices 
of y, of WFA’s cost on g, plus xy, and minus 3 ■ ui{x,y). This value can be expressed 
by a closed-form expression and, indeed, this potential can be used to prove that WFA 
is 2-competitive for 2 servers [3,4]. 

We now try to extend this idea to 3 servers. In this case, X = {x,y,z} where, again, 
we assume that x is the last request. The main difficulty that arises for 3 servers is that 
now there are infinitely many possible request sequences on points in X, and it is not 
known whether the maximum cost of WFA on these sequences (or the pseudocost) can 
be expressed in closed form. 

The general idea of our proof is to choose lazy sequences in which the adversary 
“reveals" his positions to the algorithm as late as possible. To this end, we only consider 
sequences of the form g = {yxY^ z{yxy^ z..., where each ij is large enough so that after 
requesting {yxY^ , requesting x or y does not change the work function. We call it a 
procrastinating potential. The derivation of the formula for the procrastinating potential 
is rather technical and involved, and since we do not need the derivation for our purpose, 
we state the formula without proof. 

To simplify notation, throughout this section, we assume that oj is a work function 
with the last request r. Then, if no ambiguity arises, we will write ui{x, y), instead of 
uj{r, X, y). We will also write instead of^>(^,r,etc. Letters a, b, c, d,p, q, e, /, possibly 
with accents or subscripts, denote points in M. Let also 

Co{x) = max [xa + xa' — Lj{a,a')} (5) 

a, a' 

By the comments following the definition of the shadow uj{x) in the previous section, 
there is a maximizer of w that contains r. Thus Co{x) and d)(a;) are very closely related, na- 
mely u{x) = (u(a:) -t- rx. In particular, d)(r) = u){r) . In this notation, the procrastinating 
potential is 



= (u(r) + sup {tb(p) + dd' — u){p, d) — u){p, d')} 

p,d,d' 

It is quite easy to show that satisfies the offset property (OP). The rest of this section 
focuses on the verification of the update property (UP). 
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Two other formulas. We now give two other formulas which we use to estimate our 
potential. 

= sup {-rp + uj(p) - rq + oj(g) - u)(p,g) + ee' - u)(e,e')j 

p,q,e,e' 

= sup {-rp + Lo(p) + rq + dd' -u}(g,d) - ui(g,d') + gf -u}(p, f)j 
p,q,d,d'J 




Fig. 1. A graphical representation of functions T>, A and T. 



The formulas for <P, A and T are illustrated graphically in Figure 1. In this diagram, 
a solid line between x, y represents the distance xy, and a dashed line between x, y 
represents ui{x,y). The work function values are always negative. The “+" or " 
labels on solid lines show whether the corresponding distance is a positive or a negative 
term. 

Now we are ready to state conditions under which WFA is 3-competitive in a given 
metric space. 

Theorem 5. Let M be any metric space. If A^ < <P^ and for any work 

funetion oj over M, then WFA for 3 servers on M is 3-eompetitive. 

Proof. We use Corollary 4. First, we need to verify the offset property (OP). Suppose 
that uj is minimized on configuration {r, a, b}. By choosing suitable points in the in the 
formula for we get > ra + r6 — u){a, b) + aa + ab — uj{a, b) + bb — u){a, b) — 
u){a, b) > —4u)(a, b), and (OP) follows. 

We now verify the update property. Recall that, without loss of generality, the ma- 
ximizer contains the last request, that is, uj{r) = d)(r). Therefore is of the form 
d>(r) + By Corollary 4, it is sufficient to show the following inequality: 

sr + sa + sa' — uj{a,a') + pb + pb' — p{s,b,b') 

+ dd' - p{s,p,d) — p{s,p,d') < <P^ (6) 

Before presenting the calculations for (6), we restate, in a more explicit form, the 
formulas (1) and (2). First, for k = 3, the update operation takes the form: 

{ ui{x,y) +rs ) 
uj{s,x) Ary \ 
u){s,y)Arx j 



( 7 ) 
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for all X, y. The quasiconvexity property implies that 



o;(a:, y) + u)(u, v) > min 



uj(x,u) +uj(y,v) ) 
u)(x,v) +u)(y, u) j 



( 8 ) 



for any x, y,u,v £ M. 

We are now ready to prove (6). The proof is by analysis of cases, depending on which 
of the three choices in equation (7), for each of /x(s, b, b'), jJ'{s,p, d) and p{s,p, d'), 
realizes the minimum. Denote by LS the left side of equation (6). 

Case 1: p{s, b, b') = u){b, b')+rs, p{s,p, d) = u>{p, d)+rs,andp{s,p, d') = u>{p, d') + 
rs. Then 



LS = sa — rs + sa' — rs — u){a, a') + pb + pb' — u>{b, b') + dd' — Lv(p, d) — u){p, d') 
< ra + ra' — u){a,a') + pb + pb' — ui{b,b') + dd' — u){p,d) — ui{p,d') 



Case 2: p{s, b, b') = u>{b, b')+rs, p{s,p, d) = uj(p, d)+rs, and p{s,p, d') = u>{s,p) + 
rd'. By quasiconvexity and symmetry between a, a', we can assume that u){a,a') + 
<^(P) d) > u>{a, d) + io{p, a'). Then 

LS = sa — rs + dd' — rd' — u{a, a') + pb + pb' — co{b, b') + sa' — u}{p, d) — ui{p, s) 

< ra + rd — ui(a, d) + pb + pb' — ui{b,b') + sa' — u>{p,s) — u>{p,a') 

< <P 

_ ^UJ 



Case 3: p{s, b, b') = u){b, b')+rs, p{s,p, d) = u>(p, d)+rs, and jj'{s,p, d') = oj(s, d') + 
rp. Then 

LS = sa + sa' — Lj{a, a') — rs + pb + pb' — ui{b, b') + dd' — oj(s, d') — ui{p, d) 

If u;(s, d') + u>{p, d) > u>{s,p) + u){d, d'), then 

LS < — rs + sa + sa' — uj{a, a') — rp + pb + pb' — u){b, b') — aj(s, p) 

+dd' — (jj{d, d') 

< A, 

Otherwise, by quasiconvexity, a;(s, d') + oj(p, d) > oj(s, d) + co(p, d'). Then 

2 • LS < sa + sa' — 2rs — oj(a, a') +pb + pb' — u){b, b') 

+ dd' — ui{p, d) — uj{p, d') +pb + pb' — 2rp — u(b, b') 

+ sa + sa' — ix{a,a') + dd' — ui{s,d) ~ <^{s,d') 

<2->P^ 

because sa + sa' — 2rs <ra + ra' and pb + pb' — 2rp < rb + rb'. 
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Case 4: b, b') = u){b, b')+rs, fi{s,p, d) = uj{s,p) + rd,wd p{s,p, d') = uj{spp) + 

rd'. By quasiconvexity and symmetry, we can assume that a; (a, a')+ix:{spp) > aj(s, a) + 
oj(p, a'). Then 

LS = sa + sa' — oj(a, a') +pb + pb' — u){b, b') — 2uj{spp) 

< sa — u){s,a) + pb + pb' — ui{b,b') + sa' — ui{s,p) — Lj{p,a') 

< <P 

because sa < rs + ra. 

Case 5: p{s, b, b') = u){b, b')+rs, p{s,p, d) = uj{s,p)+rd,wd jj'{s,p, d') = oj(s, d') + 
rp. By quasiconvexity and symmetry, we can assume that o;(6, b')+Ljo{s,d') > oj(s, b) + 
uj{b',d'). Then 

LS = sa + sa' — u){a, a') + pb + pb' — rp — u){b, b') + dd' — rd — u){s,p) — uj(s,' d) 
<rb' + rd'—uj{b',d') + sa + sa' — uj{a,a') + pb — u>{s,p) — u>{s,b) 

< <P 

_ ^UJ 

The proof for the remaining cases will be given in the full version of the paper. ■ 



4 WFA in the Manhattan Plane 



In this section we assume that k = 3 and the given metric space is the Manhattan plane 



Rf , that is 



F - y \ + 



F^ - y‘^\ 



where 



with the city -block metric: xy 
denote the coordinates of a point x. 

We say that {xi, . . . , Xm) is a linear m-tuple if XiXj + XjXi = XiXi for all 1 < i < 
j < f < m. If R is a finite set of ordered pairs, we say that R is a parallel bundle if there 
exist points x, y such that (x, a, 6, y ) is a linear 4-tuple for all (a, 6) € R. We also say that 
the pairs in R are parallel. We use repeated double bars for a finite parallel bundle. For 



example we write 



j. to indicate that {(a, 6), (c, d), (e, /)} is a parallel bundle. 

Similarly, if R is a set of unordered pairs, we say that R is a parallel bundle if each of 
the pairs can be ordered so that the resulting set of ordered pairs is a parallel bundle. 



Lemma 6. Let P be a finite set of pairs of points in the Manhattan plane. Then P is a 
union of two parallel bundles. 

Proof. Order each pair in P, so that (a, b) £ P implies a^ < b^. Define R (resp. C) 
to be the set of all pairs in P, where a^ < 6^ (resp. 6^ < a^). Pick A € R such that 
N > |a* I and N > |P| for all (a, b) e B and i = 1, 2. Define x,yhyx= {—N, —N) 
and y = {N, N). Then, for any (a, b) € R, (x, a, b, y) is a linear 4-tuple. Similarly 
define u = {—N, N) and v = (A, —N). Then, for any (a, b) € C, {u, a, b, v) is a linear 
4-tuple. ■ 



Theorem 7. WFA is 3-competitive for 3 servers in the Manhattan plane. 
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Proof. By Theorem 5, it is sufficient to show that A < 'P and F < <P. 

Proof of inequality F <P. Pick points p, b, b' , q, d,d', f such that 

ruj = -rp + pb + pb' - U){b, b') +rqF dd' - u){q, d) - uj{q, d') + qf - u){p, /) 



The intuition behind the proof is that, by Lemma 6, the four pairs of points; {/, q}, 
{b,p}, {b',p}, and {d, d'}, must fall into at most two parallel bundles. Depending on 
how they fall, we take advantage of the parallel relationships between the pairs to obtain 
the result. 



Case F 
{x,b,p, 



b 

p 



b', 



T) 

. Pick X, y such that {x, b,p, y) and {x,p, b' , y) are linear 4-tuples. Then 
y) is a linear 5-tuple. Since bp + b'p = bb' <rb + rb' and rq — rp< pq, we 



get 



Cu; <rb + rb' - u{b, b') + qf + qp - x>{f,p) + dd' - u{q, d) - u>{q, d') 

<<Pu. 



Case 2: 



d 

d' 



f 

q 



. Pick a point x such that {x, d, d') and {x, f, q) are linear triples. Then 



dd' = xd' — xd and fq = xq — xf. By quasiconvexity, without loss of generality, 
ui{b, b') + (^(g, d) > ui{b, d) + ui{q, b'). Therefore 



Cu, F xq + pb' — rp — u>{b', q) + xd' + xq — u){d', q) 

+ bp — xd — u>{b, d') — xf — uj{f,p) 

< rq + rb' — u>{b' , q) + xd' + xq — u){d', q) + bp — u){x, b) — u>{x,p) 

< 



The analysis of the remaining cases will be given in the full version of this paper. 
Proof of inequality A<<P. Pick p, b, b', q, c, c', e, e' such that 

Au; = —rp + pb + pb' — u){b, b') — rq + qc + qc' — ui(c, c') -I- ee' — oj(e, e') 



The intuition behind the proofis that, by Lemma 6, the five pairs of points; {e, e'}, {b,p}, 
{b',p}, {q, c}, and {q, c'}, must fall into at most two parallel bundles. Depending on 
how they fall, we take advantage of the parallel relationships between the pairs to obtain 
the result. 



Case 1: 



,, . Pick X, y such that {x, b,p, y) and {x,p, b', y) are linear 4-tuples. Then 



(x, b,p, b', y) is a linear 5 -tuple, implying that bp + pb' 
u){b, b') + uj{p, q) > Lj{b,p) + u>{b', q). Therefore 



bb'. Without loss of generality. 



Aa; < ee' — oj(e, e') + qc+ qf — ui(e, f) + bb' — u){b', q) — u>{b,p) — rp — rq 
<re + re' — aj(e, e') + qc+ qf — ui(e, f) + bb' — u}{b' , q) — u){b, q) 



<<P. 
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Case 2: 



e 



b 

P 



c 

q 



. Pick X such that {x, e, e'), {x, b,p) and {x, c, q) are linear triples, 



implying that ee' = xe' — xe, bp = xp — xb and cq = xq — xc. By quasiconvexity, 
without loss of generality, o;(e, e') + uj{p, q) > Lj{e,p) + u){e', q). Therefore 



Auj < xp — rp + qc' — rq — xc — ui(c, c') + xe' + xq — u){e', q) 

+ b'p — ex — u>{e,p) — xb — ui{b, b') 

<rx + re' — cc(x,c') + xe' + xq — u){e' ,q) + b'p — ui(x, p) — ui(x, b') 



The analysis of the remaining cases will be given in the full version of this paper. 



Corollary 8. There is a 3s/2-competitive algorithm for 3 servers in the Euclidean plane. 

Proof. Write ||a:, y \\2 and ||a;, y||^ for the Euclidean and the city-block metric, respec- 
tively. Then y||i < ||a;iy ||2 ||a;, yH^, for any two points x, y e R^. For any 

request sequence in the plane, pretend that the metric is the city-block metric and follow 
WFA. The algorithm’s cost in R| is at most its cost in Rf . The optimal eost in R^ is at 
most \/2 times the optimal cost in R|. The result follows immediately from Theorem 7. 



5 Final Comments 

We proved that WFA is 3-competitive for 3 servers in the Manhattan plane. This im- 
mediately implies that WFA is also 3-competitive in R^, the plane with the supnorm 
metric, because R^ and Rf are isometric. 

We believe that our method can be used to prove that the WFA is 3-competitive in 
other metric spaces of interest. According to Theorem 5, in order to prove this result, 
one only needs to prove that A, T < <P. We conjeeture that these inequalities are true 
on the circle. We also conjecture that the same method will work in R| (improving the 
ratio from Corollary 8 to 3). 

Theorem 5 does not apply to arbitrary metric spaces. We have an example of a 
metric space M and a work function uj for which Theorem 5 fails, namely 
Nevertheless, by pursuing further our approach and studying lazy adversary sequences 
that generalize the procrastinating adversary, it may be possible to obtain even better 
potential funetions that work in arbitrary metric spaces. 
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Abstract. A critical step in all quartet methods for constructing evolutionary trees 
is the inference of the topology for each set of four sequences (i.e. quartet). It is 
a well-known fact that all quartet topology inference methods make mistakes 
that result in the incorrect inference of quartet topology. These mistakes are cal- 
led quartet errors. In this paper, two efficient algorithms for correcting bounded 
numbers of quartet errors are presented. These “quartet cleaning” algorithms are 
shown to be optimal in that no algorithm can correct more quartet errors. An ex- 
tensive simulation study reveals that sets of quartet topologies inferred by three 
popular methods (Neighbor Joining [15], Ordinal Quartet [14] and Maximum Par- 
simony [10]) almost always contain quartet errors and that a large portion of these 
quartet errors are corrected by the quartet cleaning algorithms. 



1 Introduction 

The explosion in the amount of DNA sequence data now available [3] has made it possible 
for biologists to address important large scale evolutionary questions [1 1,12,19]. In the 
analysis of this data an evolutionary tree T that describes the evolutionary history of the 
set S of sequences involved is produced. T is modeled by an edge-weighted rooted tree 
where the leaves are labeled bijectively by sequences in S. Due to the large data sets 
involved, standard approaches for constructing evolutionary trees, such as maximum 
likelihood [9] and maximum parsimony [18], that exhaustively search the entire tree 
space are not feasible. 

In recent years quartet methods for constructing evolutionary trees have received 
much attention in the computational biology community [1,2,4,8,14,17]. Given a quartet 
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of sequences {o, b,c,d] and an evolutionary tree T, the quartet topology induced in 
T by {a, b,c,d} is the path structure connecting a, b, c and d in T. Given a quartet 
{a, b, c, d], if the path in T connecting sequences a and b is disjoint from the path in 
T connecting sequences c and d then the quartet is said to be resolved and is denoted 
ab\cd. Otherwise, the quartet is said to be unresolved and is denoted (abed). The four 
possible quartet topologies that can be induced by a quartet are depicted in Fig. 1 . 



b a 



b a 

c b • »c 



b» »d c* "d d« *c b* »d 

ab I cd ac I bd ad | be (abed) 



Fig. 1. The four quartet topologies for quartet {a, b, c, d}. 



Quartet methods are based upon the fact that the topology of an evolutionary tree T 
(that is, T without it’s edge weights) is uniquely characterized by its set Qt of induced 
quartet topologies [5] (see F ig. 2). This suggests the following three step process, referred 
to as the quartet method paradigm, for estimating an unknown evolutionary tree T for 
a set S of sequences: 

1 . For each quartet {a, 6 , c, d} of sequences in S, estimate the quartet topology induced 
by {a, 6 , c, d} in T. The procedure for producing this estimate is called a quartet 
topology inference method. Let Q be the set of ( 4 ) inferred quartet topologies. 

2. The quartet topologies in Q are recombined to produce an estimate T' of T’s topo- 
logy. The procedure for producing T’ is called a quartet recombination method. 

3. T' is rooted and edge weights determined. 




{a,b,c,d} 


:x 


{a,b,c,e} 


:x 


{a,b,d,e} 


:x 


{a,c,d,e} 


:x 


{b,c,d,e} 


:x 



Fig. 2. An evolutionary tree T and its set Qt of induced quartet topologies. 



The quartet method paradigm is illustrated in Fig. 3. There are many quartet topology 
inference methods including maximum parsimony [10], maximum likelihood [9], neig- 
hborjoining [15] and the ordinal quartet method [14]. The reader is directed to [18] for a 
good overview of these methods. Notice that computationally intensive methods such as 
maximum likelihood and maximum parsimony can be used to infer quartet topology but 
are infeasible when used to infer the entire tree topology. Existing quartet recombination 
methods include the Q* method [ 2 ], short quartet method [ 8 ], and quartet puzzling [17] 
among others [1,13]. The third step in the quartet method is well-understood [4,18]. 
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Set Q of Inferred 
Quartet Topologies 



S={a.b,c.d,e,f} < 



:x: :x::x: 
:x::x::x: 
:x: :x::x: 
:x: :x::x: 



:>TT<: 



Sequences Step 1: o..,, c«oo c 

Infer all quartet Recombine Inferred Evolutionary determine root; 

topologies quartet topologies Tree Topology T assign edge 







lengths 



Inferred Evolutionary Tree 



Quartet g&jortet Error: 
Cleaning 



Fig. 3. The quartet method paradigm. 



The algorithmic interest in the quartet method paradigm derives from the fact that 
quartet topology inferenee methods make mistakes, and so, the set Q of inferred quartet 
topologies contains quartet errors . The quartet {a, 6, c, d} is a quartet error if a6|cd € Qt 
but ab\cd ^ Q. For example, in Fig. 3, {a, b, c, e} is a quartet error sinee ab\ce € Qt 
but ac\be e Q. In this sense, Q is an estimate of Qt- Consequently, the problem of 
recombining quartet topologies of Q to form an estimate T' of T is typically formulated 
as an optimization problem: 

Maximum Quartet Consistency (MQC) 

Instance: Set Q containing a quartet topology for each quartet of sequences in S and 

A: G TV. 

Question: Is there an evolutionary tree T' labeled by S such that \Qt' H <5| > k? 

Problem MQC is NP-hard if the input Q is a partial set of quartets, i.e., quartet topologies 
can be missing [16] (this proof also implies that this version of MQC is MAX-SNP 
hard). Though it was previously shown that MQC has a polynomial time approximation 
scheme [13], proving that MQC is NP-hard turns out to be more diffieult since a eomplete 
set of quartets must be constructed' : 

Theorem 1. MQC is NP-hard. 

Clearly, the accuracy of the estimate T' depends almost entirely on the accuracy of Q. 
In fact, many quartet recombination methods are very sensitive to quartet errors in Q. 
For example, the Q* method [2] and the Short Quartet Method [8] can fail to recover the 
true evolutionary tree even if there is only one quartet error in Q. Hence, although much 
effort has been directed towards the development of quartet recombination methods, of 
prior importance is the development of methods that improve the aecuracy of inferred 
quartet topologies. 

This paper presents research on the detection and correction of quartet errors in 
Q. Methods for detecting and correcting quartet errors are ealled “quartet cleaning” 
methods (see Fig. 3). The principal contributions of the research presented here are two 
quartet cleaning methods, namely global edge cleaning and local vertex cleaning, and 

' Several proofs have been omitted due to space constraints. Please contact the authors for details. 
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an extensive simulation study that establishes both the need for these quartet cleaning 
algorithms and their applicability. 

These results are described in more detail in Sect. 1 . 1 and contrasted with previous 
results in Sect. 1.2. 

1.1 Terminology and Results 

Let S' be a set of sequences, Q be a set of quartet topologies and T the true evolutionary 
tree for S. In this paper it can be assumed that Q contains a quartet topology for each 
quartet taken from S. Several concepts must be introduced so that quartet cleaning can 
be discussed in detail. 

An edge e in tree T induces the bipartition (A, B) if T — {e} consists of two trees 
where one is labeled by A and the other by B. This is denoted e = {A, B) . Let Q{A, B) 
denote the set of quartet topologies of the form aa'\bb' where a, a' £ A and b, b' € B. 
An internal vertex v in tree T induces the tripartition {A,B,C) if T — {n} consists 
of three trees labeled by A, B and C, respectively. This is denoted v = {A,B,C). 
Let Q{A, B, C) denote the union of Q{A, B), Q{A, C) and Q{B, C). Two bipartitions 
(A, B) and {C, D) are compatible if they can be induced in the same tree, i.e., either 
A C C or C C A. Similarly, two tripartitions are compatible if they can be induced in 
the same tree. 

In order to assess the performance of quartet cleaning algorithms an understanding 
of the distribution of quartet errors in T is needed. Let Pt{cl, b) be the path between 
sequences a and b in tree T. lfab\cd is induced inT then c) C\Px{b, d) is called the 
joining path of {o, 6, c, d} in T. Notice that the joining path is necessarily non-empty 
(see Fig. 1). Define the quartet error {a, b, c, d} to be across edge e if e is on the joining 
path of {a, b, c, d} in T. Similarly, define the quartet error {a, b, c, d} to be across vertex 
u if u is on the joining path of {a, b, c, d} in T. These definitions permit the assignment 
of quartet errors in Q to edges/vertices of T. 





Fig. 4. Cleaning bounds. 



Let e = {X, Y) be the bipartition induced in T as depicted in Fig. 4. Observe that 
Qt and Qt> differ by quartets of the form ax\by where x £ X and y € F. It follows 
that \Qt — Qt'\ = (1-^1 “ 1)(|^| “ !)■ If half of the quartets of the form {a,b,x,y} 
with X £ X and y &Y have quartet topology ax\by in Q and the other half have quartet 
topology bx\ay in Q then no quartet cleaning algorithm can guarantee that all quartet 
errors across e can be corrected under the MQC principle of optimality. This implies 
that (|X| — 1)(|I^| — l)/2 is an upper bound on the number of quartet errors across an 
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edge of T that can be corrected. This example motivates the following formulations of 
quartet cleaning: 

Local Edge Cleaning A local edge cleaning algorithm with edge cleaning bound b 
corrects all quartet errors across any edge with fewer than b quartet errors across it. 
Global Edge Cleaning A global edge cleaning algorithm with edge cleaning bound b 
corrects all quartet errors in Q if each edge of T has fewer than b quartet errors 
across it. 

Analogous definitions apply for local and global vertex cleaning algorithms. Local 
edge/vertex cleaning is more robust that global edge/vertex cleaning since it can be 
applied to an edge/vertex independently of the number of quartet errors across other 
edges/vertices. This is a significant feature especially when some edges/vertices have a 
high number of quartet errors across them. In contrast, global cleaning algorithms are 
applicable only if all edges/vertices satisfy the cleaning bound. 

The example from Fig. 4 illustrates that cleaning bounds should not be constant but 
vary with bipartition sizes. Hence, an edge e = {X, Y) would have an edge cleaning 
bound that depends on |A| and |F|. In particular, the example demonstrates that the 
optimal edge cleaning bound is (|A| — 1)(|H| — l)/2. 

The first contribution of the paper is an O(n^) time global edge cleaning algorithm 
with edge cleaning bound (|X| — 1)(|H| — l)/2. Following the above remarks, this 
algorithm is optimal in the number of quartet errors it can correct across an edge e = 
{X, Y). The global edge cleaning algorithm is presented in Sect. 2.1. 

The second contribution of the paper is an O(n^) time local vertex cleaning algorithm 
with vertex cleaning bound ( |A | — 1) ( |F| — l)/4. Although this algorithm has a smaller 
cleaning bound than the global edge cleaning algorithm, it is more robust since it is local. 
Hence, there are situations where the local vertex cleaning algorithm is superior to the 
global edge cleaning algorithm and vice-versa. The local vertex cleaning algorithm is 
presented in Sect. 2.3. 

The third contribution of the paper is an extensive simulation study that assesses the 
utility of the above quartet cleaning algorithms. This study establishes the following: 

- Regardless of the quartet topology inference method used to obtain Q, quartet errors 
are prevalent in sets of inferred quartet topologies. To establish this three popular 
quartet topology inference methods are evaluated: maximum parsimony [10], neig- 
hborjoining [15] and the ordinal quartet method [14]. This establishes that there is 
a need for quartet cleaning algorithms. 

- The global edge cleaning algorithm and the local vertex cleaning algorithm are 
very effective at correcting quartet errors. In particular, the local vertex cleaning 
algorithm is more effective (due to its robustness) and both algorithms dramatically 
increase the accuracy of the inferred quartet topology set Q. 

The simulation study is presented in Sect. 3. 

1.2 Previous Results 

The idea of quartet cleaning was introduced in [13]. This paper presented a polynomial 
time global edge quartet cleaning algorithm with cleaning bound o;(|X| — 1)(|F| — l)/2 
where a > 0 is a fixed constant. Hence, this algorithm is suboptimal. Although this 
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algorithm has a polynomial time complexity, it is of very high degree, and so, of primarily 
theoretical interest. The algorithms presented here are more efficient, more effective (the 
global edge cleaning algorithm is optimal) and more robust. 

2 Quartet Cleaning Algorithms 

2.1 A Global Edge Quartet Cleaning Algorithm 

Let S' be a set of sequences that evolved on evolutionary tree T and let Q be a set of quartet 
topologies inferred from S. Assume that there are fewer than (|A| — l)(|i?| — 1)/2 quartet 
errors across each edge e = {A, B) of T. In other words, assume that |Q(A, B) — Q\ < 
(|A| — l)(|i?| — l)/2. The following algorithm constructs T from Q by building T 
iteratively from the leaves up. Note that for the purposes of this algorithm, a leaf is in R 
is considered to be a rooted subtree. 

Algorithm Global-Clean(Q) 

1. Let R := S. 

2. For every pair of rooted subtrees T\ and T 2 in R 

3. Let A denote the leaf sequences of T\ and T 2 . 

4. li\Q{A,S - A)-Q\ < {\A\ - A| - l)/2then 

5. Create a new tree T' that contains T\ and T 2 as its subtrees. 

6. Let R := R — {T\, L 2 } U 

7. Repeat step 2 until |i?| = 3. 

8. Connect the three subtrees in i? at a new vertex and output the resulting unrooted tree. 

Theorem 2. Algorithm Global-Clean outputs the tree T correctly. 

Note that if not all edges of T satisfy the global cleaning bound of (|A| — l)(|i?| — l)/2 
then Global-Clean fails to return an evolutionary tree. 

A straightforward implementation of Global-Clean results in a time complexity of 
0{'n? ■ rA) = 0{rA), since each quartet can is checked at most 0{rA) times. Section 2.2 
presents a more careful implementation and analysis that yields a linear O(n^) global 
edge cleaning algorithm. 

2.2 An Efficient Implementation of Global-Clean 

The idea is as follows. First, observe that the most time consuming step in Global-Clean 
is step 4, i.e., the identification of discrepancies between the set of quartet topologies 
induced by a candidate bipartition and those in the given set Q. Our idea is to avoid 
considering the same quartet repeatedly as much as possible. In order to achieve this, for 
each correct bipartition ( A , S' — A ) , we record the set Q ( A , S — A ) — Q using a linked list. 
Let Al and A 2 denote the leaf sequences of Ti and T 2 , respectively, considered in step 2 
of Global-Clean, and let A = Ai U A 2 , according to step 3. Then, in step 4, observe that 
Q{A,S-A)-Q = (Q(Ai,S-A)-Q)U(Q(A2,S-A)-Q)U(Q'(Ai,A 2,S-A)-Q), 
whereQ'(Ai, A 2 ,S — A)denotesthesetofallquartettopologies a6|cdwitha G Ai,6 G 
A 2 , and c,deS — A. Since Q(Ai, S — A) — QC Q{Ai, S — Ai) —Q, we can compute 
Q{Ai,S — A) — Q from the list for Q(Ai, S — Ai) in at most |Q(Ai, S — Ai)| = 0{rA) 
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time. Similarly, Q{A2, S — A) — Q can be computed in 0 {n^) time. Moreover, at each 
execution of step 2 (except for the first time), we need only check 0 {n) new possibilities 
of joining subtrees: joining the subtree resulting from the previous iteration, say Ti, 
with each of the 0 {n) other subtrees, say T2. Since all the other possibilities have been 
examined in previous iterations, their outcomes can be easily retrieved. Because step 2 is 
repeated n — 2 times, the overall time required for computing the sets Q{Ai, S—A) — Q 
and Q{A2, S—A) — Q is 0 (n ■ n ■ v?) = 0 {rA). 

Next we show that computing the sets Q' {A\ , A2, S — A) — Q also globally requires 
O(n^) time, thanks to some delicate data structures. For each possible clustering {Ti , T2 } 
of two subtrees, we split the set Q' {Ai, A2, S — A) into two lists, Q'_(Ai, A2) and 
Q'_^{Ai,A2), depending on whether a topology contradicts or corresponds to the one in 
the set Q. That is, Q'_ {Ai,A2) = Q'{Ai,A2, S—A) — Q. When subtrees T\ and T2 are 
joined, the linked list Q{A, S — A) associated to the new subtree T' = {Ti , T2 } is simply 
obtained by first adding Q{Ai, S—A) — Q and Q{A2, S—A) —Q, which requires O(n^) 
time, and then adding the list ^2), which requires 0 ( 1 ) time if Q'_{A\, A2) 

(and Q+{Ai,A2)) is represented as a doubly-linked list. Note that ^2) need 

not be scanned as its quartets are distinct from those already added to Q{A, S—A) — Q. 

The Q'_ and Q'_^_ lists of the new possible clusterings involving T' are easy to obtain. If 
Ai denotes the leaf sequences of a subtree Ti ^ Ti, T2, then Q'_{A, Ai) = Q'_{Ai, Ai) 
U Q'_{A2,Ai) - Q"{Ai,A2,Ai) and Q'_^{A,Ai) = Q'_^{Ai,Ai) U Q'_^{A2,Ai) - 
Q" {Ai, A2, Ai) where Q" {Ai, A2, Ai) denotes the members of Q'{Ai,A2, S — A) in- 
volving a leaf sequence in We proceed by scanning first Q_{Ai, A2) andQ+(^i, ^2) 

to remove the occurrences of the elements of Q"(Ti, T2, Ti) in the and Q+ lists of 
{Ti,Ti} and {T2, Ti}. Then we merge these disjoint lists in 0 ( 1 ) time to obtain those 
of {T' , Ti}. Linking together all the occurrences of each quartet in the Q'_ and Q'_^_ lists 
at the beginning of the algorithm enables us to remove each member of Q" {A\, A2, Ai) 
from other Q'_ and lists in 0 ( 1 ) time. 

We remark that at any time of the algorithm, each quartet appears only a constant 
number of times in all Q'_ and lists. If g = a,b,c,d and T\,T2,T^,Ti are the 
respective subtrees containing the leaf sequences a, 6 , c, d, then only the 6 clusterings 
{Ti,T 2}, {Ti,Ts}, {Ti,T 4}, {T2,Tg}, {T2,T4}, and {Ts,T4} resolve q, and hence q 
belongs only to the Q'_ and lists of these 6 clusterings. As each occurrence of a 
quartet in the Q'_ and lists is checked or removed only once during the algorithm, 
the maintenance of the Q'_ and Q'_^_ lists requires O(n^) time totally. Now, these lists 
(and their sizes) enable us to compute \Q'{A\,A2,S—A) — Q\ = |( 5 '_( 2 li, yl2)| in 0 ( 1 ) 
time and thus \Q{A, S' — 2I) — Q\ in O(n^) time, when needed. 

As mentioned above, Global-Clean requires n — 2 clusterings. Before each cluste- 
ring we have to examine 0 (n) new candidates, each requiring O(n^) time. Hence this 
implementation requires 0 (n • n • n^) = O(n^) time. 



2.3 A Local Vertex Quartet Cleaning Algorithm 

Let S be a set of sequences that evolved on evolutionary tree T and let Q be a set of quartet 
topologies inferred from S. Define a bipartition (A, B) to be good if \Q{A, B) — Q\ < 
(|A| — l)(|i?| — l)/ 4 . A tripartition {A,B,C) is good if each of its three induced 
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bipartitions is good, i.e., \Q{A, B U C) — Q\ < (|j 4| — l)(|i? U C| — l)/4, \Q{B, A U 
C)-Q\ < {\B\-l){\AUC\-l)/4and\Q{C,AUB)-Q\ < {\C\-l)i\AUB\-l)/4. 

The following algorithm corrects all quartets errors across vertices of T that induce 
good tripartitions. 

Algorithm Tripartition(Q) 

1. Let Tri{Q) := 0 . 

2. For every three sequences a, b and c 

3. Let A = {a}; B = {6}; C = {c}. 

4. For each w £ S — {a, b, c}. 

5. If aw\bc G Q then A = Au {tu}. 

6. If bw\ac € Q then B = BU {tu}. 

7. If cw\ab € Q then C = C U {tu}. 

8. If (A, B, C) is good then let Tri{Q) := Tri{Q) U {(A, B, C)}. 

9. Output Tri{Q). 

To show that the above algorithm is a local vertex cleaning algorithm with cleaning 
bound (|A| — 1)(|77| — l)/4, it is first proven that Tri{Q) contains all vertex induced 
tripartitions of T that are good. 

Theorem 3. Let v be a vertex ofT that induces tripartition (A, B, C). If {A, B, C) is 
good then (A,B,C) € Tri{Q). 

Next we show that the tripartitions produced by the Tripartition algorithm are compatible. 
The following lemma will be useful. 

Lemma 4. Two tripartitions are eompatible if and only if their induced bipartitions are 
all compatible with eaeh other. 

Theorem 5. If {A, B, C) and {X, Y, Z) are both in Tri{Q) then they are compatible. 

Theorem 3 informs us that Tri{Q) contains all good tripartitions of T and Theorem 5 
informs us that Tri(Q) contains only tripartitions compatible with the good tripartitions 
of T. Let T' be the tree obtained by combining the tripartitions of Tri{Q). Then Qt’ 
is a corrected version of Q. Note that unlike algorithm Global-Clean described in Sect. 
2.1, algorithm Tripartition always produces a tree and this tree is either the same as T 
or is a contraction of T. A straightforward implementation of the Tripartition algorithm 
runs in 0{rA ■ nf ) = 0{rf ) time. An 0{rf) implementation has been found recently 
by Della Vedova [7]. 

3 Simulation Study 

In this section, a simulation study is presented that addresses the utility of the quartet 
cleaning algorithms. In particular, the simulation study addresses questions of need and 
applicability: 

1 . How common are quartet errors in sets of inferred quartet topologies? 

2. How effective are the quartet cleaning algorithms at correcting quartet errors when 
they occur? 
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The approach of the simulation study was to simulate the evolution of sequences on an 
evolutionary tree T. From these sequences, a set of quartet topologies was inferred using 
a quartet inference method. This set of quartet topologies was then compared to the actual 
quartet topologies induced by T. Any quartet whose inferred and actual topologies did 
not match was considered a quartet error. By mapping these quartet errors back onto the 
vertices and edges of T, it was possible to establish the number of quartet errors across 
the vertices and edges of T. From this information it was determined which vertices and 
edges of T could be cleaned by the global edge and local vertex cleaning algorithms 
described in this paper. 

The simulation study was extensive in that sequences were evolved under a broad 
range of parameters where the sequence length, evolutionary tree topology and evo- 
lutionary rates were varied. As well, three popular quartet inference methods, namely. 
Neighbor Joining [15], the Ordinal Quartet method [14] and Maximum Parsimony [10] 
were used to infer the sets of quartet topologies from the simulated sequences. 

The details of the simulation study are as follows. For Neighbor Joining and the 
Ordinal Quartet method, distance matrices were generated from the sequences and cor- 
rected relative to the Kimura two-parameter (K2P) model of evolution [18, page 456]. 
For Maximum Parsimony, the sequences were corrected for transition/transversion bias 
by weighting the character-state transition matrix appropriately [18, pages 422-423]. 
Evolutionary trees and sequences were created in a manner similar to [6] using code 
adapted from program listtree in the PAML phytogeny analysis package [20]: Evo- 
lutionary tree topologies were generated by adding taxa at random, edge-lengths were 
assigned according to a specified mean edge-length, and sequences were evolved on 
these evolutionary trees using the K2P model with transition-transversion bias k = 0.5. 
The simulation examined 1000 randomly-generated topologies on 10 sequences X 5 
mean edge-lengths per topology x 100 edge-length sets per topology x 3 sequence 
lengths = 1,500,000 sequences datasets, where the set of mean edge-lengths consi- 
dered was {0.025, 0.1, 0.25, 0.5, 0.75} and the set of sequence lengths considered was 
(50, 200, 2000}. The simulation code consists of a number of C-language and shell- 
script programs that were compiled and run under the UNIX system on several of the 
SUN workstations owned by the computational biology group at McMaster University. 
The simulations took 5 days to complete. 

The results of the simulation study are summarized in Table 1 . Consider how the 
results address the two questions posed at the beginning of this section. 

Firstly, the results clearly establish that quartet errors are prevalent, independent of 
the quartet topology inference method used. The number of quartet errors decreases as 
sequence length increases and mean edge-length decreases but remains significant even 
for sequence length 2000 and mean edge-length 0.025. Hence, there is a definite need 
for quartet cleaning algorithms. 

Secondly, a comparison of the parenthesized and unparanthesized values in this table 
establishes the effectiveness of these algorithms. Both algorithms decrease the number 
of quartet errors significantly under a wide variety of conditions. For example, under 
the Ordinal Quartet Method with mean edge-length 0.1 and sequence length 200, the 
increase in accuracy is approximately 25%. When the mean edge-length is large and/or 
the sequence length is small, the increased accuracy is dramatic. 
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Table 1. Performance of the Quartet Cleaning Algorithms Under Simulation. The un- 
paranthesized number is the average percent of all evolutionary trees (vertices per evo- 
lutionary tree) that have no quartet errors (no quartet errors across them) after global 
edge cleaning (local vertex cleaning) has been applied. The parenthesized number is the 
average percent of all evolutionary trees (vertices per evolutionary tree) that have no 
quartet errors (no quartet errors across them) before quartet cleaning is applied. 



Quartet 

Inference 

Method 


Mean 

Edge 

Length 


Global Edge 
(% of Trees Cleanable) 


Local Vertex 

(Average % of Vertices per Tree 
that are Cleanable) 


Sequence Length 


Sequence Length 


50 


200 


2000 


50 


200 


2000 


Maximum 

Parsimony 

(Corrected) 


0.025 


10.48% 
( 0.45%) 


62.85% 

(10.85%) 


95.02% 

(56.67%) 


38.98% 

(29.20%) 


76.15% 

(59.82%) 


94.80% 

(87.27%) 


0.1 


38.62% 
( 0.22%) 


79.60% 

(6.17%) 


92.34% 

(27.06%) 


55.91% 

(28.82%) 


80.76% 

(55.57%) 


89.90% 

(74.01%) 


0.25 


26.88% 

(0.01%) 


64.44% 

(0.71%) 


78.48% 
( 6.83%) 


45.18% 

(20.09%) 


67.96% 

(37.73%) 


77.03% 

(53.93%) 


0.5 


3.25% 
( 0.00%) 


23.46% 
( 0.03%) 


56.33% 
( 0.87%) 


23.50% 

(10.59%) 


43.47% 

(22.38%) 


61.71% 

(38.42%) 


0.75 


0.30% 
( 0.00%) 


3.81% 
( 0.00%) 


27.46% 
( 0.08%) 


13.08% 
( 6.03%) 


25.46% 

(14.11%) 


46.57% 

(28.48%) 


Neighbor 

Joining 

(Corrected) 


0.025 


16.36% 
( 0.73%) 


70.47% 

(13.49%) 


95.67% 

(57.18%) 


43.79% 

(32.53%) 


79.85% 

(62.96%) 


95.06% 

(87.46%) 


0.1 


47.10% 
( 0.34%) 


81.36% 
( 6.92%) 


91.72% 

(24.57%) 


60.86% 

(32.04%) 


81.57% 

(56.65%) 


88.97% 

(72.31%) 


0.25 


30.85% 
( 0.02%) 


64.18% 
( 0.84%) 


76.48% 

(6.16%) 


47.69% 

(23.19%) 


67.50% 

(38.44%) 


75.48% 

(52.55%) 


0.5 


3.91% 
( 0.00%) 


22.27% 
( 0.07%) 


51.80% 
( 0.79%) 


26.91% 

(16.37%) 


43.25% 

(26.27%) 


58.88% 

(38.08%) 


0.75 


0.41% 
( 0.00%) 


3.63% 
( 0.00%) 


21.47% 
( 0.06%) 


16.78% 

(11.33%) 


27.56% 

(19.41%) 


43.48% 

(30.49%) 


Ordinal 

Quartet 

Method 

(Corrected) 


0.025 


20.49% 
( 1.21%) 


74.33% 

(17.11%) 


96.12% 

(59.25%) 


47.79% 

(37.36%) 


82.19% 

(66.38%) 


95.45% 

(88.19%) 


0.1 


64.15% 
( 1.21%) 


86.91% 

(12.38%) 


93.84% 

(34.53%) 


71.92% 

(40.32%) 


86.32% 

(63.45%) 


91.50% 

(78.00%) 


0.25 


63.64% 

(0.16%) 


81.72% 
( 2.42%) 


87.31% 

(13.82%) 


68.85% 

(30.29%) 


81.13% 

(47.25%) 


85.44% 

(65.10%) 


0.5 


39.03% 
( 0.04%) 


67.26% 

(0.19%) 


81.75% 
( 1.82%) 


52.41% 

(23.67%) 


69.18% 

(32.06%) 


79.47% 

(45.22%) 


0.75 


12.32% 
( 0.00%) 


35.18% 

(0.01%) 


63.60% 

(0.16%) 


34.28% 

(18.56%) 


49.52% 

(24.86%) 


66.47% 

(33.28%) 
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Abstract. I describe a C++ program for computing the smallest enclosing ball 
of a point set in d-dimensional space, using floating-point arithmetic only. The 
program is very fast for d < 20, robust and simple (about 300 lines of code, 
excluding prototype definitions). Its new features are a pivoting approach resem- 
bling the simplex method for linear programming, and a robust update scheme for 
intermediate solutions. The program with complete documentation following the 
literate programming paradigm [3] is available on the Web.* 

1 Introduction 

The smallest enclosing ball (or Euclidean 1 -center) problem is a classical problem of 
computational geometry. It has a long history which dates back to 1857 when Sylvester 
formulated it for points in the plane [8]. The first optimal linear-time algorithm for 
fixed dimension was given by Megiddo in 1982 [4]. In 1991, Emo Welzl developed a 
simple randomized method to solve the problem in expected linear time [9]. In contrast 
to Megiddo’s method, his algorithm is easy to implement and very fast in practice, for 
dimensions 2 and 3. In higher dimensions, a heuristic move-to-front variant considerably 
improves the performance. 

The roots of the program I will describe go back to 1991 when I first implemented 
Emo WelzTs new method. Using the move-to-front variant I was able to solve problems 
on 5000 points up to dimension d = 10 (see [9] for all results). Back then, the program 
was written in MODULA-2 (the language I had learned in my undergraduate CS courses), 
and it was running on a 80386 PC with 20 MHz. 

After the algorithm and the implementation results had been published, we constantly 
received requests for source code, from an amazingly wide range of application areas. To 
name some, there was environmental science (design and optimization of solvents), pat- 
tern recognition (finding reference points), biology (proteine analysis), political science 
(analysis of party spectra), mechanical engineering (stress optimization) and computer 
graphics (ray tracing, culling). 

Soon it became clear that the MODULA-2 source was not of great help in serving these 
requests; people wanted C or C++ code. Independently, at least two persons ported the 
undocumented program to C. (One of them remarked that “although we don’t understand 
the complete procedure, we are confident that the algorithm is perfect”.) Vishwa Ranjan 
kindly made his carefully adapted C program accessible to me; subsequently, I was able 

* This work was supported by grants from the Swiss Federal Ottice for Education and Science 
(Projects ESPRIT IV LTR No. 21957 CGAL and No. 28155 GALIA). 

* http : // WWW . inf . etiiz . ch/ personal/ gaertner /miniball . html 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 325-338, 1999. 

(g) Springer- Verlag Berlin Heidelberg 1999 
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to distribute a C version. (An independent implementation by David Eberly for the cases 
d = 2, 3 based on [9] is available online. 

Another shortcoming of my program — ^persisting in the C code — was the lack of 
numerical stability, an issue not yet fashionable in computational geometry at that time. 
The main primitive of the code was to solve a linear system, and this was done with plain 
Gaussian elimination, with no special provisions to deal with ill-conditioned systems. 
In my defense I must say that the code was originally only made to get test results for 
Emo’s paper, and in the tests with the usual random point sets I did for that, I discovered 
no problems, of course. 

Others did. David White, who was developing code for the more general problem 
of finding the smallest ball enclosing balls, noticed unstable behavior of my program, 
especially in higher dimensions. In his code, he replaced the naive Gaussian elimination 
approach by a solution method based on singular value decomposition (SVD). This 
made his program pretty robust, and a C++ version (excluding the SVD routines which 
are taken from Numerical Recipes in C [6]) is available from David White’s Weh page. 

3 

Meanwhile, I got involved in the CGAL project, a joint effort of seven European 
sites to build a C++ library of computational geometry algorithms.'* To prepare my code 
for the library, I finally wrote a C++ version of it from scratch. As the main improvement, 
the code was no longer solving a complete linear system in every step, but was updating 
previous imformation instead. A “CGALized” version of this code is now contained in 
the library, and using it together with any exact (multiple precision) number type results 
in error-free computations. 

Still, the numerical problems arising in floating-point computations were not solved. 
Stefan Gottschalk, one of the first users of my new C++ code, encountered singularities 
in the update routines, in particular if input points are (almost) cospherical, very close 
together or even equal. The effect is that center and squared radius of the ball maintained 
by the algorithm can become very large or even undefined due to exponent overflow, 
even though the smallest enclosing ball problem itself is well-behaved in the sense that 
small perturbations of the input points have only a small influence on the result. 

As it turned out, previous implementations suffered from an inappropriate represen- 
tation of intermediate balls that ignores the good-naturedness of the problem. The new 
representation scheme respects the underlying geometry — it actually resulted from a 
deeper understanding of the geometric situation — and solves most of the problems. 

The second ingredient is a new high-level algorithm replacing the move-to-front 
method. Its goal is to decrease the overall number of intermediate solutions computed 
during the algorithm. This is achieved hy reducing the problem to a small number of 
calls to the move-to-front method, with only a small point set in each call. These calls 
can then be interpreted as ‘pivot steps’ of the method. The advantages are a substantial 
improvement in runtime for dimensions d > 10, and much more robust behavior. 

The result is a program which I think reaches a high level of efficiency and stability 
per lines of code. In simplicity, it is almost comparable to the popular approximation 



^ http ; //www. cs .unc . edu/" eberly/ 
^ http ; //vision. ucsd. eduZ-dwhite 
http ; //www. cs .un.nl/CGAL 
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algorithms from the Graphics Gems collection [7,10]; because the latter usually com- 
pute suboptimal balls, the authors stress their simplicity as the main feature. The code 
presented here shares this feature, while computing the optimal ball. 



2 The Algorithms 

Given an n-point set P = {pi, . . . ,Pn} Q let mb(P) denote the ball of smallest 
radius that contains P. mb(P) exists and is unique. For P, B C P n P = 0, let 
mb(P, P) be the smallest ball that contains P and has all points of P on its boundary. We 
have mb(P) = mb(P, 0), and if mb(P, P) exists, it is unique. Finally, define mb(P) := 
mb(0, P) to be the smallest ball with all points of P on the boundary (if it exists). 

A support set of (P, P) is an inclusion-minimal subset of P with mb(P, P) = 
mb(S', P). If the points in P are affinely independent, there always exists a support set 
of size at most d+1 — |P|, and we have mb(S', P) = mb(S' U P). 

If p ^ mb(P, P), then p lies on the boundary of mb(P U {p}, P), provided the latter 
exists — that means, mb(P U {p}, P) = mb(P, P U {p}). All this is well-known, see e.g. 
[9] and the references there. 

The basis of our method is Emo Welzl’s move-to-front heuristic to compute mb(P, P) 
if it exists[9]. The method keeps the points in an ordered list L which gets reorganized as 
the algorithm runs. Let Li denote the length-i prefix of the list, p* the element at position 
i in L. Initially, L = Ln stores the points of P in random order. 

Algorithm 1. 

mtf _mb(P„, P): 

(* returns P) *) 

mb := mb(P) 

IF |P| = d + 1 THEN 
RETURN mb 
END 

FOR i = 1 TO n DO 
IF p* ^ mb THEN 

mb := mtf P U {p*}) 

update L by moving p* to the front 
END 
END 

RETURN mb 

This algorithm computes mb(P, P) incrementally, by adding one point after another 
from the list. One can prove that during the call to mtf _mb(Pn, 0 ), all sets P that come 
up in recursive calls are affinely independent. Together with the above mentioned facts, 
this ensures the correctness of the method. By induction, one can also show that upon 
termination, a support set of (P, P) appears as a prefix Ls of the list L, and below we 
will assume that the algorithm returns the size s along with mb. 

The practical efficiency comes from the fact that ‘important’ points (which for the 
purpose of the method are points outside the current ball) are moved to the front and 
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will therefore be processed early in subsequent reeursive calls. The effeet is that the ball 
maintained by the algorithm gets large fast. 

The seeond algorithm uses the move-to-front variant only as a subroutine for small 
point sets. Large-scale problems are handled by a pivoting variant which in every iteration 
adds the point which is most promising in the sense that it has largest distance from the 
current ball. Under this scheme, the ball gets large even faster, and the method usually 
terminates after very few iterations. (As the test results in Section 5 show, the move-to- 
front variant will still be faster for d not too large, but there are good reasons to prefer 
the pivoting variant in any case.) 

Let e(p, mb) denote the excess of p w.r.t. mb, defined as \\p — c|p — r^, c and the 
center and squared radius of mb. 

Algorithm 2. 

pivot_mb(L„): 

(* returns mb(L„) *) 
t := 1 

(mb,s) := mtf _mb(Lt, 0 ) 

REPEAT 

(* Invariant: mb = MB(Lt) = s <t*) 

choose k > t with e := e(p^, mb) maximal 
IF e > 0 THEN 

(mb, s') := mtf _mb(Ls, {p^}) 
update L by moving p^ to the front 
t := s -j- 1 
s := s' + 1 

END 

UNTIL e < 0 
RETURN mb 

Because mb gets larger in every iteration, the proeedure eventually terminates. The 
computation of (mb, s') can be viewed as a ‘pivot step’ of the method, involving at most 
d + 2 points. The choice of k is done aecording to a heuristic ‘pivot rule’, with the 
intention of keeping the overall number of pivot steps small. With this interpretation, the 
procedure pivot_mb is similar in spirit to the simplex method for linear programming 
[1], and it has in fact been designed with regard to the simplex method’s efficiency in 
practice. 

3 The Primitive Operation 

During a eall to algorithm pivot _mb, all nontrivial computations take place in the pri- 
mitive operation of computing MB(i?) for a given set B in the subcalls to mtf_mb. 
The algorithm guarantees that B is always a set of affinely independent points, from 
which \B\ < d + 1 follows. In that case, MB(i?) is determined by the unique eircum- 
sphere of the points in B with center restricted to the affine hull of B. This means, 
the center c and squared radius satisfy the following system of equations, where 
B = {go, • • • < d + 1. 
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(gt - c)^(gi - c) = r^, i = 0, 

m— 1 

Aigi = c, 

i=0 

m—1 

^ A. = L 

i=0 

Defining Qi := g^ — go, for i = 0, . . . , m — 1 and C := c — go, the system ean be 
rewritten as 

C'^C = 

(Qi - - C) = r^, i = l, (1) 

m— 1 

^ KQi = C. 

i=l 

Substituting C with \Qi in the equations (1), we deduce a linear system in 

the variables Ai , . . . , A^-i which we can write as 



where 



/ Ai 



/ QiQi 



Ab 



\^m— 1/ \Qm—lQ'm—^. 



Ab := 



/ 2QfQi ••• \ 



\2QjQm-i ■■■ 2Q^_^Qm-lJ 



( 2 ) 



( 3 ) 



Computing the values of Ai , . . . , A^-i amounts to solving the linear system (2). C 
and are then easily obtained via 



m— 1 

C = ^ XiQi, = C^C. 
2 = 1 



We refer to C as the relative center w.r.t. the (ordered) set B. 



( 4 ) 



4 The Implementation 

Algorithms 1 and 2 are implemented in a straightforward manner, following the pseu- 
docode given above. In case of algorithm mtf _mb, the set B does not appear as a formal 
parameter but is updated before and after the recursive call by ‘pushing’ resp. ‘popping’ 
the point pA This stack-like behavior of B also makes it possible to implement the pri- 
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mitive operation in a simple, robust and effieient way. More preeisely, the algorithm 
maintains a deviee for solving system (2) which can conveniently be updated if B chan- 
ges. The update is easy when element p* is removed from B — we just need to remember 
the status prior to the addition of p*. In the course of this addition, however, some real 
work is necessary. 

A possible device for solving system (2) is the explicit inverse of the matrix 
Ab defined in (3), along with the vector 

^ QTQi 

vb ■■= [ 

V Qm—lQm—l 

Having this inverse available, it takes just a matrix-vector multiplication to obtain the 
values Ai , . . . , A^_i that define C via (4). 

Assume B is enlarged by pushing another point qm- Define B' = B U {qm}- Let’s 
analyze how A^} can be obtained from A^^ . We have 






/ 


2QjQm \ 


a 

II 


Ab 








‘^Qm-lQm 




\2Qi Qm • • • ‘^Qin-lQm 


‘^QmQm / 



and it is not hard to check that this equation can be written as 

L^, 

\ fi| 

\0 ••• 0\z J 

where 





/ 


o\ 


Ab' = L 


Ab 


0 




\0 ••• 0 


zj 



/ 1 




o\ 




/ /^1 \ 


( 




1 


0 


, jjj . 


\ /^ m — 1 / 


II 

J- 




fJ'm—1 






\ 






and 

2 = 2Q^Qm ~ i‘2:Qi Qm, ■ ■ • ) ‘^Qm—lQm)Aj^ 

This implies 




/ 



A-^} = [L 



T\-l 



a; 



\0 ••• 0 



0 \ 



0 

1/z/ 



■ -1 



(5) 



( 6 ) 



(7) 




Fast and Robust Smallest Enclosing Balls 331 



where 



/ 1 


o\ 


1 


0 


\ /^1 ' ' ' l^m —1 


1/ 



Expanding (7) then gives the desired update formula 



— 

^ o/ — 



Ay + fifi'^/z 


-pjz 


, -h' /z 


1/2 , 



( 8 ) 



with fi and 2 : as defined in (5) and (6). 

Equation (8) shows that As may become ill-conditioned (and the entries of 
very large and unreliable), if z evaluates to a very small number. The subsequent lemma 
develops a geometric interpretation of z from which we can see that this happens exactly 
if the new point qm is very close to the affine hull of the previous ones. This can be the 
case e.g. if input points are very close together or even equal. To deal with such problems, 
we need a device that stays bounded in every update operation. 

As it turns out, a suitable device is the {d X d) -matrix 

Mb ■= 2QbA^^Q^, 



where 



Qb • — {Ql ' ' ' Qm—l) 

stores the points Qi as columns. Lemma 1 below proves that the entries of Mb stay 
bounded, no matter what. We will also see how the new center is obtained from Mb, 
which is not clear anymore now. 

Lemma 1. 

(i) With jj, as in (5), we have 

m— 1 

^ ^ — Qrrii 

i=l 

where Qm is the projection of Qm onto the subspace spanned by the Qi. 

(ii) ^bQui Qm- _ 

(Hi) z = 2{Qm—Qm)'^{Qm — Qm),i-e. z is twice the distance fwm Q^to Us pwjection. 
(iv) If C and are relative center and squared radius w.r.t. B, then the new relative 
center C' and squared radius r'^ (w.r.t. B') satisfy 

Qm), ( 9 ) 

( 10 ) 

where 

e = {Qm-C)^{Qm-C)-r\ 



C' = C+-{Q^~ 

Z 

/2 2 , 

’■ +2J' 
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(v) Mb is updated according to 



Mb'=Mb-\ {Qm — Qm){Qm — Qm)^ ■ (H) 

The proof involves only elementary calculations and is omitted here. Property (ii) 
gives Mb an interpretation as a linear function: Mb is the projection onto the linear 
subspace spanned by Qi, ■ ■ ■ , Qm-i- Furthermore, property (v) implies that Mb stays 
bounded. This of course does not mean that no ‘bad’ errors can occur anymore. In (1 1), 
small errors in Qm — Qm can get hugely amplified if 2 : is close to zero. Still, M b degrades 
gracefully in this case, and the typical relative error in the final ball is by an order of 
magnitude smaller if the device Mb is used instead of . 

The lemma naturally suggests an algorithm to obtain C , r' and Mb' from C, 
and Mb, using the values Qm, e and 2 . 

As already mentioned, even Mb may get inaccurate, in consequence of a very small 
value of 2 before. The strategy to deal with this is very simple: we ignore push operations 
leading to such dangerously small values! In fhe ambienf algorifhm mtf _mb this means 
that the point to be pushed is treated as if it were inside the current ball (in pivot _mb the 
push operation is never dangerous, because we push onto an empty set B). Under this 
scheme, it could happen that points end up outside the final ball computed by mtf _mb, 
but they will not be very far outside, if we choose the threshold for 2 appropriately. 

The criterion is that we ignore a push operation if and only if the relative size of 2 
is small, meaning that 



2 



r 



2 

curr 



< e 



( 12 ) 



for some constant e, where is the current squared radius. Now consider a subcall 
to mtf_mb(Ls, {p^}) inside the algorithm pivotunb, and assume that a point p <E Ls 
ends up outside the ball mbo with support set So and radius ro computed by this subcall. 

One can check that after the last time the query ‘p* ^ mb ?’ has been executed with 
p* being equal to p in mtf _mb, no successful push operations have occured anymore. It 
follows that mb = mbo in this last query, the query had a positive answer (because p 
lies outside), and the subsequent push operation failed. This means, we had 2 /r^ < £ at 
that time. 

Let rmax denote the radius of mb(Ls, {p^}). Because of (10), we also had j2z < 
'’'max nt the time of the failing push operation, where e is the excess of p w.r.t. a ball 
MB(i? U {p^}) with B C So- We then get 




22 



2^ „ 

Za - 
^0 



Assuming that Tmax is not much larger than ro (we expect push operations to fail 
rather at the end of the computation, when the ball is already large), we can argue that 



i = 0{V^)- 

^0 




Fast and Robust Smallest Enclosing Balls 333 



Moreover, because mbo contains the intersection of MB(i? U {p^}) with the affine hull 
of i? U {p^}, to which set p is quite close due to 2 : being small, we also gel 

^=0(V^), (13) 

where Cq is Ihe excess of p w.r.t. the final ball mbo, desired. This argument is not a 
striet proof for the correetness of our rejection criterion, but it explains why the latter 
works well in practiee. In the code, e is ehosen as 10^^^. Beeause of (13), the relative 
error of a point w.r.t. the final ball is then expeeted to stay below 10^^® in magnitude. 
The latter value is the relative aceuracy in the typical situations where the threshold 
criterion is not applied by the algorithm at all. Thus, e is ehosen in such a way that even 
when the criterion comes in, the resulting error does not go up. 



Checking 

While it is easy to verify that the computed ball is admissible in the sense that it eontains 
all input points and has all support points on the boundary (approximately), its optimality 
does not yet follow from this; if there are less than d + 1 support points, many balls 
are admissible with respect to this definition. The following lemma gives an optimality 
condition. 

Lemma 2. Let S be a set of affinely independent points. mb(S') is the smallest enclosing 
ball of S if and only if its center lies in the convex hull of S. 

The statement seems to be folklore and ean be proved e.g. by using the Karush- 
Kuhn-Tucker optimality eonditions for constrained optimization [5], or by elementary 
methods. 

From Section 2 we know that the algorithm should compute a support set S that 
behaves according to the lemma; still, we would like to have a way to check this in order 
to safeguard against numerical errors that may lead to admissible balls which are too 
large. Under the deviee , this is very simple — the eoeffieients Xi we extract from 
system (2) in this ease give us the desired information: exactly if they are all nonnegative, 
S defines the optimal ball. 

The weakness in this argumentation is that due to (possibly substantial) errors in 
Af^ , the Xi might appear positive, although they are not. One has to be aware that 
“eheeking” in this case only adds more plausibility to a seemingly correct result. Real 
cheeking would ultimately require the use of exact arithmetic, which is just not the point 
of this code. 

Still, if the plausibility test fails (and some Xi turn out to be far below zero), we 
do know that something went wrong, which is important information in evaluating the 
code. 

Unfortunately, in using the improved device Mb during the computations, we do 
not have immediate access to the Xi. To obtain them, we express C as well as the points 
Qi, • • • , Qm-i with respect to a different basis of the linear span of Qi,. . . , Qm-i- 
In this representation, the linear combination of the Qi that defines C will be easy to 
deduce. 
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The basis we use will be the set of (pairwise orthogonal) vectors Qi — Qi,i = 
1, . . . , m — 1. From the update formula for the center (9) we immediately deduce that 



m— 1 

C=J2MQ^-Qi), 

i=l 

where fi is the value ej z that was computed according to Lemma l(iv) when pushing 
qi. This means, the coordinates of C in the new basis are (/i, . . . , /m-i). 

To get the representations of the Qi, we start off by rewriting Mb as 

m-l ^ 

Mb = ^ — {Qk — Qk){Qk — Qk)^ , 

which follows from Lemma l(v). Here, Zk denotes the value z we got when pushing 
point gfe. 

Now consider the point Qi. We need to know the coefficient an^ of Qk — Qk in the 
representation 



m—1 

Qi — ^ ^ ^iki^Qk Qk^- 

k=l 



With 



Mb^ ■= ^ — {Qk — Qk){Qk — Qk)"'" 



(14) 



we get 



MBiQi — Qi 

(afteradding gitoi?,(3iprojectsto itself). Via(14),thisentailso;i,i+i = • • • = ai^^^i = 
0 and 

2 - j, 

C^ik — {Qk Qk) Qi'! k ^i. 

Zk 

In particular we get an = 1. The coefficients Xi in the equation 






c=J2 



i=l 



are now easy to compute in the new representations of C and the Qi we have just 
developed. For this, we need to solve the linear system 



/ On • • • Om-l,! \ / "^1 



/ A 



y V 1 / \fm-l 
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This system is triangular — everything below the diagonal is zero, and the entries on 
the diagonal are 1 . So we ean get the Xi by a simple back substitution, according to 

m— 1 

— fi ^ ^ ^ki^k- 
k=i-\-l 

Finally, we set 

m— 1 

Ao = 1 — Xk, 

k=l 

and check whether all these values are nonnegative. 

How much effort is necessary to determine the values ciiit? Here comes the punch 
line: if we actually represent the according to ( 1 4) and during the push of qi evaluate 

the product 



i—l 2 i — 1 

Qi — / ^ (^Qk Qk'){Qk Qk^ Qi — ii^ik i^Q k Qk^ 

k=l ^ k=l 

according to this expansion, we have already computed aik by the time we need it for 
the checking! 

Moreover, if we make representation (14) implicit by only storing the Zt and the 
vectors Qk~Qk, we can even perform the multiplication - 1 Qi with 0{di) arithmetic 

operations, compared to 0{(P) operations when we really keep Mb as a matrix or a 
sum of matrices. 

The resulting implementation of the ‘push’ routine is extremely simple and compact 
(about 50 lines of code), and it allows the checker to be implemented in 10 more lines 
of code. 

5 Experimental Results 

1 have tested the algorithm on various point sets: random point sets (to evaluate the speed), 
vertices of a regular simplex (to determine the dimension limits) and (almost) cospherical 
points (to check the degeneracy handling). In further rounds, all these examples have been 
equipped with ‘extra degeneracies’ obtained by duplicating input points, replacing them 
by ‘clouds’ of points very close together, or embedding them into a higher dimensional 
space. This covers all inputs that have ever been reported as problematic to me. A test 
suite (distributed with the code) automatically generates all these scenarios from the 
master point sets and prints out the results. 

In most cases, the correct ball is obtained by the pivoting method, while the move- 
to-front method frequently fails (results range from mildly wrong to wildly wrong on 
cospherical points, and under input point duplication resp. replacement by clouds). This 
means, although the move-to-front approach is still slightly faster than pivoting in low 
dimensions (see the results in the next paragraph), it is highly advisable to use the pivoting 
approach; it seems to work very well together with the robust update scheme based on 
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the matrix Mb, as described in Section 4. The main drawbacks of the move-to-front 
method are its dependence on the order of the input points, and its higher number of 
push operations (the more you push, the more can go wrong). Of course, the input order 
can be randomly rearranged prior to computation (as originally suggested in [9]), but 
that eats up the gain in runtime over the pivoting method. On the other hand, if one does 
not rearrange, it is very easy to come up with bad input orders (try a set of points ordered 
along a line). 

Random point sets. I have tested the algorithm on random point sets up to dimension 30 
to evaluate the speed of the method, in particular with respect to the relation between the 
pivoting and the move-to-front variant. Table 1 (left) shows the respective runtimes for 
100, 000 points randomly chosen in the d-dimensional unit cube, in logarithmic scale 
(averaged over 100 runs). All runtimes (excluding the time for generating and storing 
the points) have been obtained on a SUN Ultra-Sparc II (248 MHz), compiling with the 
GNU C++-Compiler g++ Version 2.8.1, and options -03 -funroll-loops. The latter 
option advises the compiler to perform loop unrolling (and g++ does this to quite some 
extent). This is possible because the dimension is fixed at compile time via a template 
argument. By this mechanism, one also gets rid of dynamic storage management. 

As it turns out, the move-to-front method is faster than the pivoting approach up to 
dimension 8 but then loses dramatically. In dimension 20, pivoting is already more than 
ten times faster. Both methods are exponential in the dimension, but for applications 
in low dimensions (e.g. d = 3), even 1, 000, 000 points can be handled in about two 
seconds. 





Table 1. Runtime in seconds for 100, 000 random points in dimension d\ pivoting (solid line) and 
move-to-front (dotted line) (left). Runtime in seconds on regular d-simplex in dimension d (right). 



Vertices of a simplex. The results for random point sets suggest that dimension 30 is 
still feasible using the pivoting method. This, however, is not the case for all inputs. In 
high dimensions, the runtime is basically determined by the calls to the move-to-front 
method with point set S U {p*}, S the current support set. We know that IS"! < d + 1, 
but if the input is random, [S'] will frequently be smaller (in dimension 20, for example, 
the average number of support points turns out to be around 17). In this case, a ‘pivot 
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step’ and therefore the eomplete algorithm is much faster than in the worst case. To test 
this worst case, I have chosen as input the vertices of a regular d-simplex in dimension 
d, spanned by the unit vectors. In this case, the number of support points is d. Table 
1 (right) shows the result (move-to-front and pivoting variant behave similarly). Note 
that to solve the problem on 20 points in dimension 20, one needs about as long as for 
100, 000 random points in dimension 26! 

As a conclusion, the method reaches its limits much earlier than in dimension 30, 
when it comes to the worst case. In dimension 20, however, you can still expect reasonable 
performance in any case. 

Cospherical points. Here, the master point sets are exactly cocircular points in dimen- 
sion 2, almost cospherical points in higher dimensions (obtained by scaling random 
vectors to unit length), a tesselation of the unit sphere in 3-space by longitude/latitude 
values, and vertices of a regular d-cube. While the pivoting method routinely handles 
most test scenarios, the move-to-front method mainly has problems with duplicated input 
points and slightly perturbed inputs. It may take very long and computes mildly wrong 
results in most cases. The slow behavior is induced by many failing push-operations 
due to the value z being too small, see Section 4. This causes many points which have 
mistakenly been treated as inside the current ball to reappear outside later. 

The most difficult problems for the pivoting method arise from the set of 6 1 44 integer 
points on the circle around the origin with squared radius = 3728702916375125. 
The set itself is handled without any rounding errors at all appearing in the result (this 
is only possible because still fits into a floating-point value of the C++ type double). 
However, embedding this point set into 4-space (by adding zeros in the third and fourth 
coordinate), combined with a random perturbation by a relative amount of about 10^^*^ 
in each coordinate makes the algorithm fail occasionally. In these case, the computed 
support set does not have the orgin in its convex hull, which is detected by the checking 
routine. 

6 Conclusion 

I have presented a simple, fast and robust code to compute smallest enclosing balls. 
The program is the last step so far in a chain of improvements and simplifications 
of the original program written back in 1991. The distinguishing feature is the nice 
interplay between a new high-level algorithm (the pivoting method) and improved low- 
level primitives (the Mb - based update scheme). 

For dimensions d < 10, the method is extremely fast, beyond that it slows down a 
bit, and for d > 20 it is not suitable anymore in some cases. This is because every ‘pivot 
step’ (a call to the move-to-front method with few points) takes time exponential in d. 
Even slight improvements here would considerably boost the performance of the whole 
algorithm. At this point, it is important to note that high dimension is not prohibitive for 
the smallest enclosing ball problem itself, only for the method presented. Interior point 
methods, or ‘real’ simplex-type methods in the sense that the pivot step is a polynomial- 
time operation (see e.g.[2]) might be able to handle very high dimensions in practice, 
but most likely at the cost of losing the simplicity and stability of the solution I gave 
here. 
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Abstract. We introduce an innovative decomposition technique which reduces 
a multi-dimensional searching problem to a sequence of one-dimensional pro- 
blems, each one easily manageable in optimal time x space complexity using tra- 
ditional searching strategies. The reduction has no additional storage requirement 
and the time complexity to reconstruct the result of the original multi-dimensional 
query is linear in the dimension. 

More precisely, we show how to preprocess a set of S' C IN'* of multi-dimensional 
objects into a data structure requiring 0{m log n) space, where m = |S| and n is 
the maximum number of different values for each coordinate. The obtained data 
structure is implicit, i.e. does not use pointers, and is able to answer the exact 
match query in 7(d — 1) steps. Additionally, the model of computation required 
for querying the data structure is very simple; the only arithmetic operation needed 
is the addition and no shift operation is used. 

The technique introduced, overcoming the multi-dimensional bottleneck, can be 
also applied to non traditional models of computation as external memory, distribu- 
ted, and hierarchical environments. Additionally, we will show how the proposed 
technique permits the effective realizability of the well known perfect hashing 
techniques on real data. 

The algorithms for building the data stmcture are easy to implement and run in 
polynomial time. 

1 Introduction 

The efficient representation of multi-dimensional points set plays a central role in 
many large-scale computations, including, for instance, object management in distri- 
buted environments (CORBA, DCOM); object-oriented and deductive databases ma- 
nagement [2,5,25,10,19], and spatial and temporal data manipulation [20,24]. All these 
applications manage very large amounts of multi-attribute data. Such data can be con- 
sidered as points in a d-dimensional space. Hence, the key research issue, in order to 
provide "good" implementations of these applications, is the design of an efficient data 
structure for searching in the d-dimensional space. A fundamental search operation is 
the exact match query, that is, test the presence of a point in the multi-dimensional set 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 339-353, 1999. 
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when all its coordinates are specified. Another important operation is the prefix-partial 
match query which looks for a set of points, possibly empty, for whom only the first 
k < d coordinates are specified. 

We deal with the exact match query by using an innovative decomposition technique 
which reduces a multi-dimensional searching problem to a sequence of one-dimensional 
problems, each one easily manageable in optimal time X space complexity using tradi- 
tional searching strategies. The reduction requires no additional storage besides that one 
required for data and the time complexity to reconstruct the result of the original multi- 
dimensional query is linear in the dimension. The technique introduced, overcoming the 
multi-dimensional bottleneck, can be applied in more general contexts, such as distri- 
buted and hierarchical environments. Additionally, it can be positively used, jointly with 
perfect hashing techniques, when dealing with real data. 

The technique is based on two main steps. In the first step, we reduce the d- 
dimensional searching problem to a sequence of d one-dimensional searching problems. 
In the second step, the multi-dimensional data is reconstructed using a set of {d — 1) 
2-place functions. Each function is represented using a new data structure derived from 
a decomposition of the 2-place functions into a set of “sparse" 2-place functions easily 
representable. The decomposition technique of 2-place functions is an application of a 
more general technique introduced in [22] and successively refined in [23] for testing re- 
achability in general directed graphs. The same technique has been successfully applied 
in [21] to the problem of implicitly representing a general graph. 

The data structure we present has the following characteristics: 

- general and deterministic. We represent any multi-dimensional point set and our 
space and time bounds are worst-ease deterministic; 

- space and time efficient'. Exact match query requires 7 (d—1) steps and prefix-partial 
match l{k—l)+t steps, where t is the number of points reported, using O (m log n) 
space, where m is the size of the point set and n is the maximum number of values 
a coordinate can receive; 

- easy to implement'. The algorithms used to build the data structure, although some- 
what tricky to analyze, are very simple and run in 0{rfi) time; no operations are 
needed for searching other than one-dimensional array accesses; 

- simple computation model'. The only arithmetic operation required for querying the 
data structure is the addition and no shift operation is used. 

Due to its relevance, the multi-dimensional searching problem has been deeply 
investigated. In computational geometry and for spatial databases, the problem has been 
solved only for small values of the dimension [18,20] and the solutions proposed grow 
exponentially with d. The same problem has been studied for temporal databases [24] . 
In this case, we have empirical results, only, and the worst case is unbounded. In a 
general setting, there are two major techniques for implementing the multi-dimensional 
searching problem: trees and hashing. For the first, several data structure have been 
developed as d-dimensional version of data structures for the one-dimensional problem 
(e.g. B-trees [3], compacted tries [17], digital search trees [14]). In this ease, even though 
the space complexity is optimal exact match queries require a logarithmic number of 
steps in the worst case. Hashing and perfect hashing techniques have the drawback 
that, for each search performed, it may be required the evaluation of computationally 
complex functions [8,7,6]. Hence, numerically robust implementations are required. 
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With our technique, each search only requires a constant number of table accesses, and 
addresses to be accessed are computed with only a constant number of additions. 

Concerning the comparison of our technique with the less powerful computational 
models considered in the so-called word-RAM approach [9], namely the RISC model, 
the are two issues to be considered. First, our technique does not need to use a shift 
operation, which may require at least log m additions to be simulated, where m is the 
problem size. Second, the overall space needed for computations in the word-RAM 
model is 0(2™) bits, where w is the word size, and for the model to be of interest this 
quantity has to be considerably larger than the problem size m, namely 2™ ^ m ([9], 
pag. 371). Contrast this with the overall space needed in our approach that, expressed in 
terms of m, can be written as 0(m log^ m). 

The paper is structured as follows: In Section 2 we describe the representation of the 
multi-dimensional problem by means of a sequence of 2-place functions; In Section 3 we 
give some definitions and notations, and present some decomposition theorems; Using 
these theorems, in Section 4 we describe the data structure for representing a 2-pIace 
function and, hence, a multi-dimensional points set; then, in Section 5 we present some 
application of our technique. Finally, in Section 6 we outline some open problems and 
future research directions. 



2 Problem Representation 



In this section, we show first how to reduce a multi-dimensional problem to a set of 
one-dimensional problems, and then how to reconstruct the original problem. 

Given S C 1N“*, with m = IS*!. Let x = x\, . . . ,Xd & S, then = \{xi : x G S'}!. 
The reduction is defined by the following set of functions: 

: IN I — ^ rii} 1 < i < d. (I) 

Each function §i maps the values of a coordinate to a set of integers of bounded 
size. This mapping can be easily represented with data structures for one-dimensional 
searching, such as B-trees or perfect hashing tables. Without loss of generality, from 
now on we assume S C [/“*, where U ={!,... n}, being n = maxi{ni}. 

Hence, let a; = xi, . . . , € S' C be a generic key of S, where Xi denotes the 

value of the i-th coordinate. Let a = oi, 02 , . . . , be a value in C/“*. We denote with 
a{i) the subsequence of its first i coordinates, namely a{i) = oi, 02 , . . . , called a 
partial value or the prefix {of length i) of a. We write b{j) C a{i) when j < i and 
bk = CLk, for A: = 1, 2, . . . , j. In the same way, we define the prefix for a key in S. 

Let S{a{i)) be the subset of S containing all keys that are coincident on the prefix 
a(f).Notethat |S(a(d— 1))| < nand |S(a(d))| < l.Foranya(i) such that |S(a(f))| > 1 
and ^b{j) C a{i) such that S D S{b{j)) D S{a{i)) we say that a{i) is the maximal 
shortest common prefix of S{a{i)) with respeet to S. We assume that it does not exist 
a maximal shortest common prefix a{i) such that S{a{i)) = S, since otherwise we 
can consider a reduced dimension universe, by simply deleting the maximal shortest 
common prefix from every key. 
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The representation mechanism we use for keys is based on a suitable coding of subsets 
of keys with common prefixes of increasing length, starting from the maximal shortest 
common prefixes. We denote with fi a 2-place function such that fi'.U^xU i— ^ S'. We 
code keys using these functions in an incremental way. 

Given a set T of keys, we denote with sf, I < I < kx, the i-th key in a fixed, 
but arbitrarily chosen, total ordering of the kx keys in T. The choice of the order is 
immaterial: we use it only to make the description clearer. 

Let us now assume a{i) is a maximal shortest common prefix with respect to S. For 
reasons that will be clearer in the following, we only take into account maximal shortest 
common prefixes longer than 1. We then represent S{a{i)),i > 1, with the following 
technique. 

First we represent the i — 1 smallest elements in S{a{i)) as it follows: 

/i (01,02) = s/ 

< ... 

e / \ S{a(i)) 

Now, if ks(a(i)) < f — 1, we have represented all elements in S{a{i)) and we are 
done. Otherwise we still have to represent the ks(a(i)) ~ {i ~ ^) remaining elements in 

5' = 5(o(i))\U=i'sf(“«^ 

All keys in S' can then be partitioned in subsets, possibly just one, each containing 
keys with a common prefix a{i+j) D a(i), and such that, for each subset S'', o(i +>) 
is the maximal shortest common prefix of S' = S(a(i + jr)) H S' with respect to S'. 
We now represent the ks^ keys in S' by recursively applying the same approach. 
Namely, we first represent the jV smallest keys in S' as it follows: 

fi(^Cl± . . . (Xi^ — 5]^ 

< ... 

s' 

Now, if ks'^ < > — 1, we have represented all elements in S' and we are done. Otherwise, 

we still have to represent the A: 5 ^ — {i+jr — 1) remaining elements in S'' = S' \Ui=i -sf • 
All keys in S'' can then be partitioned in subsets, possibly just one, each containing 
keys with a common prefix a{i+jr + h) D a{i+jr), and such that, for each subset S''^, 
a{i+jr + hr,q) is the maximal shortcst common prefix of S'' ^ = S(a(i+> + fir,g))nS" 
with respect to S". And now the representation process goes on recursively. 

We now show an example of the application of the definitions introduced above. 



Example 1. Assume d = 6 and n = 9. Consider a set S = {233121, 233133, 233135, 
233146, 234566, 234577, 234621, 234622, 234623, 343456}. Then there are only two 
maximal shortest common prefixes with respect to S, namely 23 of length 2 and 343456 
of length 6. 

We then set /i(2, 3) = 233121 and /i(3, 4) = 343456: since kgip.'i) ^2 — 1 while 
A^S( 343456 ) <6—1 then S(343456) has been completely represented, while for keys 
remaining in S' = S(23) \ {233121} we have to recursively apply the same technique. 
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The maximal shortest common prefixes in S' = {233133, 233135, 233146, 234566, 
234577, 234621, 234622, 234623} are 2331 oflength 2 + 2 and 234 of length 2+ 1. It is 
S'} = S(2331) n S' = (233133, 233135, 233146} and S} = S(234) n S' = (234566, 
234577, 234621, 234622, 234623}. 

We then set /2(23, 3) = 233133 and /s(233, 1) = 233135; we also set /2(23, 4) = 
234566. Since ks' ^2 and ks' ^1 then both for keys remaining in S}' = S( \ 
{233133, 233135} and for those in S}' = S} \ (234566} we have to recursively apply 
the same technique. We obtain the following sets: /4(2331, 4) = 233146, /3(234, 5) = 
234577, /s(234, 6) = 234621, U(2M6, 2) = 234622, and /5(23462, 3) = 234623. 

Given a d-dimensional set of points S C a point x = ai, tt 2 , ■ ■ ■ , can be 
searched by incrementally evaluating the 2-place functions fi. At each step i, with 
i = (1, . . . , d — 1}, two cases are possible: fi{ai . . . a^, a^+i) = x and we are done. 
Otherwise, the search continues with the evaluation of /i+i. It is trivial to verify that 
the search ends reporting x if and only if a: € S'. In the next section we show how to 
efficiently represent 2-place functions so that the above search strategy can be executed 
in a constant number of steps. 

3 2-place Functions Representation 

In order to state the main result of this section we need to recall some definitions and 
give new notations. 

3.1 Definitions 

A bipartite graph G = {AvjB,E)\sa graph with A fi i? = 0 and edge set E C Ax B. 

Given a 2-place function f : A x B i — ^ IN, a unique labeled bipartite graph 
G = {AU B,E) can be built, such that the label of {x, y) £ E is equal to z if and only 
ifxeA,yeB, and f{x,y) = z£ Z. Hence, the representation of a 2-place function 
is equivalent to test adjacency in the bipartite graph and lookup the label associated to 
the edge, if it exists. For ease of exposition, in the following, we will deal with labeled 
bipartite graphs instead of 2-place functions. Moreover, from now on, ua and ub denote 
the number of vertices in A and B, respectively, and m is the number of edges of the 
bipartite graph. 

Given a bipartite graph G = (A U B, E), x £ A\J B is adjacent to y € A U i? 
if (x, y) e E. Given a vertex x, the set of its adjacent vertices is denoted by a{x); 
6{x) = \a(x) I is the degree x. The notation is extended to a set S of vertices as a{S) = 
U 2 ,gso;(a;) andb(S') = The maximum degree among vertices in S' is denoted 

by As- In particular, Aa and Ab denote the maximum degree among vertices in A 
and B, respectively. A bipartite graph is regular if all vertices have the same degree 
A = A A = Ab- a bipartite graph G = (A U B, E) is bi-regular if all vertices in A 
have the same degree A a and all vertices in B have the same degree Ab- 

Given a set of vertices S e Aor S e B, Ss{x) = \a{x) n S| denotes the number 
of vertices in S adjacent to X- Furthermore, dj{S) = (a; € o;(S) : bs(a;) = j} denotes 
the set of vertices in o;(S) incident to S with exactly j edges. Given a set of vertices 
S e A U i?, the sub-bipartite induced by S is the sub-bipartite G' = (S,Es), with 
Es = En{Sx S). 
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A h-cluster S' is a set of vertices, either in A or in B, s.t. < h, x e o;(S). A 

1 -cluster is simply called cluster. 

3.2 Partitioning into /i-Clnsters 

We present an algorithm whieh, given a bipartite graph G = {A\J B, E), computes a 
h-cluster C C A, with h = [logns]; hence, the sub-bipartite indueed by C U i? has 
the property /ia(C) < h. Of course, this can be done trivially if C consists of at most 
h vertices. Somewhat surprisingly, it turns out that a clever selection of vertices of the 
h-cluster, we can find a h-cluster of J? (^ 3 ^) vertices, hence a significant fraction of all 
vertices in A. 

The idea behind the algorithm derives from the following observation: when we add 
a new vertex x to the h-cluster, then for each vertex y in a{x), its degree Sc{y) with 
respect to C increases by one. A trivial approach would be to just check that for each 
vertex y e a{C) n a{x), 6c{y) < h — 1 holds; this guarantees ^a(c) < h after the 
insertion. Unfortunately, on the long run this strategy does not work. A smarter strategy 
must look forward, to guarantee that not only the current choice is correct, but that it 
does not restrict too much successive choices. A new vertex x is added to the cluster in 
h successive steps, at each step j observing how x increases the number \aj^i{C)\ of 
vertices adjacent to C having degree j — 1 with respect to C. At each step the selection 
is passed by those vertices which do not increase too much the number |o;j_i(C')| of 
vertices adjacent to C having degree j — 1 with respect to C, where “too much” means 
no more than t times the average value over all candidates at step j, for some suitable 
choice of t. 

We will prove that this strategy causes the number \ah (C) \ of vertices adjacent to C 
having degree h with respect to C to increase very slowly, thus ensuring that this number 
is less than 1 until at least vertices have been chosen, for a fixed constant j3. 

The algorithm is presented in Figure 1 ; from now on, Ci denotes the /i-cluster at the 
end of step i, and Si^j the set of vertices, to be added to Ci-\, that passed the selection 
step j. Furthermore, the notation ao{Ci) is extended to denote the set B — a{Ci) of all 
vertices in B not adjacent to Ci. 

Lemma 2. > {ua — i + 1) (l — \Y . 

Proof. At each step j we select those vertices x e such that |o;j_i ) n a{x) \ 

is no more than t times the average value Pi,j-i over all vertices in Sij-i. If a set 
of n non-negative integers with average value y then at most n/t elements have value 
greater than ty and, hence, at least n(l — 1/t) elements have value at most ty, thus 
iL I , \vith ^ T 1; the Lemma follows. 

Lemma 3. Letriij = \aj{Ci)\, that is the number of vertices y in a{Ci) s.t. defy) = j. 
Then 

tAsf — 1 ) 

fiA - i-b 1) (1 - ^) " 



— 



riB 
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do ^ 0; 
i t— 0; 

repeat 

i t— i + 1; 

Si,o i— A — di— i; 

for j ^ 1 to /i do begin 

l“3-l(C'i_i)nQ;(a:)| 

z_-/xes^ j_i 

^*4-1 — ^ ’ 

j t (E Si^j — 1 . I CTj — 1 (dj — 1 )rtcr(x)| — ij“, 

end ; 

pick a vertex x € Si, j ; 

Ci i — Ci—1 U {x}; 
until Sij = 0 ; 
end . 



Fig. 1. Algorithm Select. 



Proof. The proof is by induction on the step j. 



Base step: j = 1. At step (i, 1), pip is the average degree of the — * + 1 vertices in 
A — Ci^i with respect to vertices not connected to o;(C'i_i). Thus, pi^o < and 

a vertex x that is added to Si_i verifies ( 2 ^) < nj^+i ■ 

If X is added to Ui-ip is increased by at most new vertices. Hence, 



tm 

ni,i < ni-1,1 H —~r 

UA—t+t 



i-l 






tm 

< 

ua — k 



tm{i — 1) 
nA-i + l 



Since m < ABnB,riiA < proved. 



Induetion step: j — 1 — E j. At step (i, j), Pij-i is the average degree of candidate 
vertices in Sij^i with respect to vertices in o;j_i(C'i_i). By Lemma 2 and since the 
total number of edges outgoing from aj-i(C'i_i) is at most ABUi-ij-i, we have 
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A vertex x is added to Sij-i if it verifies Henee, if x is 

added to Ci-\, Ui-ij is increased by at most new vertices. Thus, 






< 






2 — 1 

<E 

k= 






j'-i 



(n^ - f + 1) (1 - ^ S (n^ - /c) (1 - ^ 



IAb 



i-l 



(nA-i + l)(l-iy 






< 



IAb 



2-1 



i\i-i ^ 



{nA-i + l){l-\y k=i 

1 f 



tAB{k — 1 ) 



{nA-k + 1) (l - i)' 






riB 



O’-l)! 



< 



< 




n_B 



i-l 



_{n-i + l) (l- " 

This concludes the induction step. 






^ \3 

-ji.-iy . 






Theorem 4. Let G = {Au B,E) be a bipartite graph. For h > [log nsj. Algorithm 



Select finds a h-cluster C of 



{ 2 e 2 +l)AB 



vertices in time 0{\C\nAAA). 



Proof. lft = h>2, then (l — ^) ^ be the value of index i at the end 

of the execution of Algorithm Select. Considering that h\ > ( 7 )^, Lemma 3 implies: 

-| h-\-l 

UB . 



7 X 4 /i-l-l 

‘■max — 



e2Z\B(imax - 1) 
imax T 1 



If 



< 



2 e 2 Ab + 1 



then 



e 2 AB(in 



- 1 ) 



< 



hence, < 1 for 



h > [lognsj > [log ns] — 1 , and is a h-cluster. 



From now on /3 denotes the constant 2 e 2 + 1 < 10. The following theorem will be used 
to derive the space complexity of the proposed data structure. 

Theorem 4 leads to the following 



Corollary 5. Let G = {AU B,E) be a bipartite graph. For h > [log nsj, A can be 
partitioned into \2I3 Ab\ ■ [logn^i] h-clusters. The time complexity is n\A a- 

Proof. The sequence of clusters is computed by repeatedly selecting a h-cluster and 
removing its vertices from A. Let us suppose that after k iterations the number nf of 
vertices remained in A is greater than ua / 2, but after h + 1 iterations is less than or equal 
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By Theorems 4, during the first k iterations, algorithm Select finds h-clusters of 



at least 



riAl'i 

PAb 



vertices. Hence, in k iterations at least k 



riAl'i 

[IAb 



vertices have been 



removed from A, so k < 2[3 Ab- 

We can repeat the same argument to the remaining vertices, each time halving the 
number of vertices still in H; this can obviously repeated no more than [logUA] times. 



3.3 Partitioning into Clnsters 

The following lemma characterizes the complexity of partitioning a bipartite graph G = 
{A U B, E) into clusters (1-clusters). Clusters will be used to build the ground data 
structure upon which the others are based. 

Lemma 6. Let G ~ {Au B,E) be a bipartite graph. B ean be partitioned in 1 + 
Ab{Aa — 1) elusters. The time required is 0{nBAAAB) 

Proof. Let Bi, ... , Bk be a partition of B into clusters so that Bi is a maximal cluster 
for B — Uj=i - Each vertex y £ Bi has at most Ab adjacent vertices, and each of 
them has at most Aa — 1 adjacent vertices different from y. Hence, a vertex y £ Bi 
prevents at most Ab{Aa — 1) vertices to be included in the same cluster. Since the 
cluster is maximal, each vertex in B either has been chosen in Bi or has been excluded 
from it, so ub < \Bi\{l + Ab{Aa — 1)). Hence \Bi\ > ■ The lemma 

follows. 

Note that the bound given by Lemma 6 is tight, since there exists an infinite class of 
regularbipartite graphs that cannotbe decomposed in less than 1+Z\(Z\— 1) clusters [11]. 



4 The Data Structure 

In this Section we present the data structure for the multi-dimensional searching problem. 

Based upon the decomposition theorems given in Section 3.2, we previously present 
a data structure for labeled bipartite that allows us to represent a bipartite graph G = 
{A U B, E) in 0{n + mlogn) space, and to test if two vertices are adjacent with a 
constant number of steps. For sake of clarity, we first describe a simpler data structure 
that represent bi-regular bipartite graphs, then extend the result to represent all bipartite 
graphs. 

4.1 Representing Bi-Regnlar Bipartite Graphs 

Given a bi-regular bipartite graph G = {A\JB, E),we partition A in h-clusters according 
to Corollary 5; hence, we obtain a sequence of at most bipartite graphs Gi = 

{Ai, B,Ei), where Ai is the i-th h-cluster and Ei = E D {Ai x B). Then we partition 
the vertex set B of each bipartite Gi = {Ai U B,Ei) into clusters. Lemma 6 ensures 
that each bipartite graph is decomposed into at most 1 + h{AA — 1) clusters. 

We define the following arrays: 

- hclus of size n^; i = hclus[a;] is the index of the unique h-cluster Ai to which 
X e A belongs; 
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- clusofsizens x \2j3ABh\, j = clus[y,i] is the index of the unique cluster 
in Gi to which y £ B belongs; 

- join- of size X (1 + — 1)); y = join[a;, j] is the unique possible vertex 

y B adjacent to x in the j-th cluster in the unique i-th h-cluster to which x 
belongs. 

Adjacency on the bipartite graph can be tested in 3 steps since {x, y) E if and only if, 

given i = hclus[a;] and j = clus[y, i], y = join[a:, j] holds. The total space required 
is 

OiriA + riB \2j3ABh \ + UAif + h{AA — 1))) = 0((n + m) logn) . 

Note that if m < n then isolated vertices can be trivially represented, so the space 
complexity becomes 0{n + m log n) . 

4.2 Representing Bipartite Graphs and 2-place Fnnctions 

We now show how to obtain for general bipartite graphs the same results as for bi- 
regular graphs. Given a bipartite graph G = (A U i?, E), we first partition B into 
maximal subsets Bi, s.t. Vy € Bi, 2* < S{y) < 2*+^. We obtain a sequence of at most 
h = [logn] bipartite graphs Gi = {A U Bi,Ei), where Bi is the i-th subset of B and 
Ei = Er\{Ax Bi). 

Then, according to Corollary 5, for each such bipartite graph Gi we partition A 
into h-clusters Aij, obtaining a sequence of at most [2*+^/?] h bipartite graphs Gij, 
and further partition each /i-cluster into at most h subsets ^ s.t. Vx € Aij^k, 2^ < 
SBi {x) < 2^+^, obtaining a sequence of bipartite graphs Gij^k- 

Finally, for each bipartite graph Gij^k,'we partition the set Bi into clusters; Lemma 6 
ensures that each bipartite graph Gij^k is decomposed into at most 1 + h{AAi ^ ^ ~ 1) 
clusters. 

We define the following arrays: 

- range of size Ub', i = range [y] is the index of the unique subset Bi to which y 
belongs; 

- hclus of size ua X h\ j = hclus[a;, i] is the index of the unique h-cluster Ai^j to 
which X A belongs in Gi. 

- subs of size nA x h; k = subs[a:, i] is the index of the unique subset Aj j in the 
unique h-cluster to which x £ A belongs in Gi. 

- For each vertex y e Bi, we define an array ranges^^ of size [2*+^/?] h; rangeSj^[y] 
is a reference to the array clus, which contains the cluster indices of y in all subsets 

it is empty if y is not adjacent to any vertex in Aj j. The total space needed 
for array rangeSj^[y] for all y € i? is 

[2*+i/?] h) S{y)f3h) = 0{mh) . 

Bi yeBi Bi yeBi 

Reading rangeSj^[ji] requires 2 steps, one to read the initial address of the array 
given y, and one to access its y-th element. 

- For each vertex y G Bi, and each /i-cluster Aj j connected to y, we define an array 
clus; clus[A:] is the index of the unique cluster in Gij^k to which y £ Bi belongs; 
it is empty if y is not adjacent to any vertex in Aij^k- For each vertex y G Bi, since 
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2* < <^(y) < 2*+^, at most 2*+^ such arrays are defined, each of them having size 
h. Hence, the total space needed for all arrays clus is 

0(^ ^ Th) = 0(^ ^ 5{y)h) = 0{mh) . 

Bi y€Bi Bi yeBi 

- joins of size Ba x h; joins [at, i] is a reference to the array join, whieh eontains 
all vertiees in Bi adjacent to x. It is empty if x is not adjacent to any vertex in Bi. 

- For eaeh vertex x € Aij^k, and eaeh set Bi conneeted to x, we define an array join 
of size (1 + h(2^+^ — 1)); join[i] is the (unique) possible vertex adjacent to x in 
the i-th cluster of Gi^y, it is empty if x is not adjacent to any vertex the i-th cluster 
of Gij^k- For eaeh vertex x £ A, the spaee needed for all its related arrays join is 
0{^B 2d_B. (x)) = 0{h6{x), so the total space for arrays join for all a; G H is 
0{mh). 

Adjaceney on the bipartite graph can be tested in constant time sinee {x, y) £ E if and 
only if, given i = range[y], j = hclus[a;, i], k = subs[a;, i], clus = rangeSj^[j], 
I = clus[A:] and join = joins[a;, i], y = join[i] holds. The test requires 7 steps. The 
total spaee required is 

0{nB + BAhA mh) = 0{{n A m)\ogn) . 

Also in this case, if m < n then isolated vertices can be trivially represented, so the 
space eomplexity becomes 0{n + m log n). 

^From the above discussion, we have the following theorem: 

Theorem 7. There exists a data structure that represents a bipartite graph with n ver- 
tices and m edges in space 0{n + m log n). Vertex adjacency can be tested in 7 steps. 
Preprocessing time is 0{n?A), where A is the maximum vertex degree. 

The representation of a 2-place funetion and the lookup operation whieh given two 
objects, return a value assoeiated to the pair, is equivalent to the following: given a 
bipartite graph G = {A\J B, E), and a labeling function C : E ^ IN, and x <E A, 
y G B,if{x,y) G Firetum£(a;,y). This ean be easily aceomplished with the previously 
described data structure and, whenever {x, y) G E, extending join[i] to contain both 
the (unique) possible vertex y adjacent to x in the i-th cluster of Gij^k and the value 
£(x, y). This leads to the following theorem: 

Theorem 8. There exists a data structure that represents a 2-place function of size m 
between objects from a domain of size n in space 0{n + m log n). The lookup operation 
requires 7 steps. Preprocessing time is 0{n^). 

4.3 Representing Mnlti-dimensional Data 

Given a point set S C [/“*, with m = IS"! and n = \U\. Let (/i, . . . , fd-i) be the 
sequenee of 2-place funetions representing S, as deseribed in Section 2. Additionally, 
let m = [S'! and rrii be the size of the 2-place funetion fi, for l<i<d— 1. By the 
definition, we have m = Yli=i Moreover, for any i, m < rrii, rii being the size of 
the domain set of f . By Theorem 8, we can state the following theorem: 
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Theorem 9. There exists an implicit data structure that represents a set S C of 
multi-dimensional points in space O (m log n).The exact match and prefix-partial match 
queries can be performed inl{d—l) and 7{d— 1) +t steps, respectively, where t is the 
number of points reported. The preprocessing time is 0{drfi). 

5 Extensions 

5.1 External Memory Data Structure 

Due to its nature, the above described data structure can be efficiently applied to secon- 
dary storage. In this paper, we consider the standard two-level I/O model introduced by 
Aggarwal and Vitter in [ 1 ] . In this case, we can devise a powerful compression technique 
leading to a space optimal data structure. 

In the data structure described in Section 4.2, the critical arrays are ranges, clus,and 
join, that is those requiring a total space 0{mh), which in terms of external memory 
storage implies 0{mh/B) blocks. The following lemma counts the number of non- 
empty entries in these arrays: 

Lemma 10. Let k' be the total number of non-empty entries in arrays ranges^^ydr all 
y € B; k” be the total number of non-empty entries in all arrays clus; and k”' be the 
total number of non-empty entries in arrays join. Then k' < m, k” < m, k'” < m. 

Proof. If rangeSj^[j] is not empty, then some edge {x, y) belongs to on the other 
hand, there is a unique bipartite graph Gi^j containing such edge. Hence, the total number 
of non-empty entries in arrays ranges^^ for all y € i? is at most m. 

If clus[A:] is not empty for some vertex y £ Bi and some /i-cluster Aij connected 
to y, then some edge {x, y) belongs to Gij^k', since there is a unique bipartite graph 
Gi,j,k containing such edge, the total number of non-empty entries in all arrays clus is 
at most m. 

If join[/] is not empty, then some edge {x, y) belongs to Gij^k', there is a unique 
bipartite graph Gij^k containing such edge; the total number of non-empty entries in 
arrays join for all a; € H is at most m. 

Let a be an array of size k. We partition a into intervals of B elements, and represent 
each interval by a reference to the block containing the non empty entries in that interval. 
It is easy to see that an array a of size k with k' empty entries can be represented in 
0{^ + k') space, thus in 0{^ + blocks; furthermore, one access to a[i] maps to 
2 memory accesses, hence the external memory version of Theorem 8 and Theorem 9 
can be stated as follows: 

Theorem 11. There exists an external— memory data structure that represents a 2-place 
function of size m between objects from a domain of size n with blocks. 

The lookup operation requires 10 1/Os. 

Theorem 12. There exists an external memory implicit data structure that represents 
a set S C 1N“* of multi-dimensional points with 0{m/ B) blocks. The exact match and 
prefix-partial match queries require 10(d — 1) and 10(d — 1) + t/B I/Os, respectively, 
where t is the number of points reported by the partial match query. 
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5.2 Incremental Exact Match Qneries 

The representation we propose for a multi-dimensional point set S allows to efficiently 
perform the exact match operation in a more general context. In fact we can define the 
incremental exaet match query, where the coordinates are specified incrementally, that 
is, the search starts when the first coordinate is given, and proceeds refining the searching 
space as soon as the other coordinates are specified. This definition of exact search is 
particularly useful in distributed environments where the request for a query is expressed 
by sending messages along communication links [15,16,4,12,13] and not all coordinates 
reside on the same machine. 

Another field of application of the incremental exact match query is for the interactive 
exploratory seareh on Web. In this case the user can specify the searching keys one by 
one so as to obtain intermediate results. 

Also, the incremental exact match query is particularly practical when dealing with a 
point sets from a very high multi-dimensional space (order of thousands of keys). In this 
case we can manage the query in a distributed environment by specifying only /c <C d 
keys a time in order to prevent network congestion and to obtain a more reliable answer. 



5.3 Improving Conventional Searching Data Strnctnres 

Our decomposition technique can be positively applied to one-dimensional hashing and 
perfect hashing when dealing with real keys. Let w be the machine word length, and 
K ^ w the key length. We can divide each key in K/w sub-keys, and reduce the original 
one-dimensional searching problem to a multi-dimensional searching problem, which 
can be solved with our technique with no additional storage and with a constant number 
of I/Os. 

Another important application it to the trie data structure. With a technique similar 
to the one above described, we can consider larger node sizes. 



6 Open Problems 



One important open problem is that of dynamizing the data structure; even an incremental- 
only version data structure would be a useful improvement. Another important research 
direction is to extend the operation set to include other operations useful for the mana- 
gement of a multi-dimensional data set (e.g. range queries, retrieve maximal elements, 
orthogonal convex-hull, ete.) 

We are currently carrying out an extensive experimentation on secondary memory, 
based on a data sets derived from a business application. This experimentation activity 
is still at its beginning, the main purpose being primarily to test the effective speedup 
in the lookup operation and the overall size of the representation on these data sets. 
Preliminary experimentation results show that the behavior of our data structure is very 
fast and works very well in the average case. 
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Abstract. We revisit classical geometric search problems under the assumption 
of rational coordinates. Our main result is a tight bound for point separation, ie, 
to determine whether n given points lie on one side of a query line. We show that 
with polynomial storage the query time is 6>(log b/ log log b), where b is the bit 
length of the rationals used in specifying the line and the points. The lower bound 
holds in Yao ’s cell probe model with storage in and word size in b° ^ \ By 
duality, this provides a tight lower bound on the complexity on the polygon point 
enelosure problem: given a polygon in the plane, is a query point in it? 



1 Introduction 

Preprocess n points in the plane, using storage, so that one can quickly tell whether 
a query line passes entirely below or above the points. This point separation problem is 
dual to deciding whether a query point lies inside a convex polygon. As is well known, 
this can be done in O(logn) query time and 0{n) storage, which is optimal in the 
algebraic decision tree model [8,9]. This is suitable for infinite-precision computati- 
ons [3,4,20], but it does not allow for bucketing or any form of hashing. Unfortunately, 
these happen to be essential devices in practice. In fact, the computational geometry 
literature is rife with examples of speed-ups derived from finite-precision encodings of 
point coordinates, eg, range searching on a grid [17], nearest neighbor searching [11,12], 
segment intersection [13], point location [16]. 

To prove lower bounds is usually difficult; even more so when hashing is allowed. 
Algebraic models are inadequate and one must turn to more general frameworks such as 
the cell probe model [18] or, in the case of range searching, the arithmetic model [7,19]. 
As a searching (rather than computing) problem, point separation lends itself naturally 
to the cell probe model and this is where we confine our discussion. Our main interest 
is in pinpointing what sort of query time can or cannot be achieved with polynomial 
storage. Note that some restriction on storage is essential since constant query time is 
trivially achieved with exponential space. 

Let P be a set of n points in the plane, whose coordinates are rationals of the form p/q, 
where p and q are 6-bit integers. A cell probe algorithm for point separation consists of a 
table of size n'^, with each cell holding up to 6“* bits, for some arbitrarily large constants 
c, d. A query is answered by looking up a certain number of cells and outputting yes or 

* This work was supported in part by NSF Grant CCR-96-23768, ARO Grant DAAH04-96-1- 
0181, NEC Research Institute, Ecole Polytechnique, and INRIA. 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 354-365, 1999. 
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no, depending on the information gathered. For lower bound purposes, the query time 
counts only the number of cells that are looked up during the computation. 

Theorem 1 . Given any cell-probe algorithm for point separation, there exist an input 
of n points and a query line that require J?(log 6/ log log b) time. The lower bound is 
tight. 

The upper bound can be achieved on a standard unit-cost RAM. Take the convex 
hull of the points and, given the query line, search for the edges whose slopes are nearest 
that of the line. Following local examination of the relative heights of the line and 
edge endpoints, conclude whether there is point separation or not. This is elementary 
computational geometry and details can be skipped. The main point is that the problem 
reduces to predecessor searching with respect to slopes (rational numbers over 0(6) 
bits), which can be done optimally using a recent algorithm of Beame and Fich [2]. Their 
algorithm preprocesses n integers in [0, iV], so that the predecessor of any query integer 
can be found in 0(log log N/ log log log N) time, using storage. By appropriate 
scaling and truncation, their scheme can be used for predecessor searching over the 
rationals, with the query time becoming 0(log 6/ log log 6), for rationals with 0(6) -bit 
numerators and denominators. 

2 The Complexity of Point Separation 

The input consists of a set P of n points in R^, which is encoded in a table T of size 
rf^, where c is an arbitrarily large constant. To simplify the notation we can replace c 
by max{c, d}, and require that each cell should hold at most w = b‘^ bits. A cell probe 
algorithm is characterized by a table assignment procedure (ie, a function mapping any 
P to an assignment of the table T to actual values) together with an infinite sequence of 
functions /i , / 2 , etc. Given a query £ (ie, a certain line in R^), we evaluate the index fi (£) 
and look up the table entry T[fi{£)]. If T[/i(f)] encodes whether £ separates the point 
set or not, we answer the query and terminate. Otherwise, we evaluate f2{£, T[fi{£)]) 
and lookup the entry T[f2{£, T[fi{£)])], and we iterate in this fashion until a cell probe 
finally reveals the desired answer. Note that such a framework is so general it easily 
encompasses every known solution to the point separation problem. 

We use Miltersen’s reformulation [15] of a cell probe algorithm as a communication 
complexity game between two players [14]. Alice chooses a set £i of candidate queries 
(ie, a set of lines in the plane), while Bob decides on a collection Pi of n-point sets. Note 
that each pair (f , P) € £i x Pi specifies a problem instance. Alice and Bob’s task is then 
to exhibit a problem instance {£, P) € £i x Pi that requires G(log 6/ log log 6) probes 
in T to answer. They do that by simulating each probe by a round in a communication 
complexity game. 

The possible values of the index fi{£) partition £i into equivalence classes. 
Alice chooses one of them and sends to Bob the corresponding value of fi{£). Of all the 
possible 2“’ assignments of the entry T[fi{£)] Bob chooses one of them and narrows 
down his candidate set Pi to the set P 2 of point sets leading to that chosen value of 
T[fi{£)]. Bob sends back to Alice his choice of T[fi{£)]. Knowing £ and T[fi{£)], 
Alice chooses a value for / 2 (f, T[fi{£)]) and communicates it to Bob, etc. Each round 
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k produces a new pair (£fe+i , 'Pk+i) with the property that, for all queries in £fe+i and 
all point sets in Vk+i, Bob and Alice exchange the same information during the first 
k rounds, which are thus unable to distinguish among any of the problem instances in 

k^k+i X 'Pk+i- 

We say a query line (resp. point set) is active at the beginning of round k if it belongs 
to Ck (resp. Vk)- The set Ck x Vk is called unresolved if it contains at least two problem 
instances (f, P) and (f',P') with different yes/no outcomes: in such a case. Bob and 
Alice need to proceed with round k, and the cost of the protocol (ie, the minimum number 
of rounds necessary) is at least k. We show that for some suitable n = n(6), given any 
cell probe table assignment procedure, there exist a starting set £i of query lines and a 
starting colleetion Pi of n-point sets in the plane that allow Bob and Alice to produce a 
nested sequence of unresolved sets 



£i X Pi D • • • D £t X Pt, 



where t = ©(log 6/ log log b). 

The protocol between Bob and Alice builds on our earlier work on approximate 
searching over the Hamming cube [5], which itself borrows ideas from the work of 
Ajtai [1] on predecessor searching. A protocol for predecessor queries of a similar fiavor 
was recently devised independently by Beame and Fich [2]. 



2.1 Points and Lines 

Let Pi denote the point and given i < j, let aij = \{i + + p) and bij = 

{{i + j)/2, ij). Any of Bob’s n-point sets P is of the form 

P— "I Pii ) ) ^^*2 > ^*2*3 I • • • ^ia-liedPis J’ I 

for some p < • • • < is, where n = 2s — 1 and X denotes the symbol a or 6 (not 
necessarily the same one throughout the sequence). Thus, P can be specified by an 
index set I = I (P) = {ii , . . . , } consisting of s distinct &-bit integers and a bit vector 

a = <j{P) of length s — 1 specifying the X’s. For technical reasons, we require that all 
the integers of the index set / be even. 

The starting query set C\ consists of the lines of the form, y = 2kx — k^, for all 
odd 6-bit integers k. Note that this is the equation of the line through pk tangent to 
the parabola y = x^. The number of bits needed to encode any point coordinate or 
line coefficient is 26 (and not 6, a minor technicality). Note that the problem does not 
become suddenly easier with other representations such as ax + fiy — 1, and that for 
the purposes of our lower bound, all such representations are essentially equivalent. The 
following is immediate. 

Lemma 2. Let pi- and be two points of P and let I be the line y = 2kx — k^, 
where ij < k < ij+i- The line i separates the point set P if and only if the symbol X 
inXip.^^ is of type b. 
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2.2 A Hierarchy of Tree Contractions 

Keeping control of Alice query lines is quite simple. The same cannot be said of Bob’s 
point sets. Not only Bob’s collections of point sets must be kept large but they must 
include point sets of all shape (but not size; remember that their size n is fixed). This 
variety is meant to make the algorithm’s task more difficult. Some point sets must 
stretch widely with big gaps between consecutive points, while others must be confined 
to narrow intervals. For this reason, we cannot define point sets by picking points at 
random uniformly. Instead, we use a tree and a hierarchy of contractions of subtrees to 
define intervals from which we can specify the point sets. 

Consider the perfect binary tree whose leaves (nodes of depth b) correspond to the 
integers 0 through 2*^ — 1, and let 71 denote its subtree of depth d* sharing its root, where * 



log b 

2 log log b 



and d = [c^ log b\ 



( 1 ) 



We assume throughout that the bit size b and the constant c are both suitably large. 
Note that b greatly exceeds d* and so the tree 71 is well defined. Given a node v of the 
tree 71, let 71 (u) denote its subtree of depth d*^^ rooted at v. Contract all the edges of 
71 except those whose (lower) incident node happens to be a leaf of Ti{v), for some 
node V of depth at most d* — d*^^ and divisible by d*^^. This transforms the tree 71 
into a smaller one, denoted 7/i, of depth d. Note that the depth-one subtree formed by an 
internal node v ofZ7i and its children forms a contraction of the tree 71 (u) (Fig. 2). 

Repeating this process leads to the construction of 74 for 1 < A: < t. Given an 
internal node v oflAk-i, the depth-one tree formed by v and its children is associated 
with the subtree Tl-i (u), which now plays the role of 71 earlier, and is renamed 71. For 
any node u G 71 of depth at most d*^^+^ — d*^^ and divisible by d*^^, let 71 (m) denote 

' All logarithms are to the base two. 
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^ 1 (v) 



Fig. 2. The tree 71 and its contraction into Wi . 



the subtree of Tk of depth rooted at u\ as before, turn the leaves of Tk{u) into the 

children of u by contracting the relevant edges. This transforms Tk into the desired tree 
Uk of depth d. 

The contraction process is the same for all k < t, but not for k = t. We simply make 
all the leaves of 7t into the children of the root and remove the other internal nodes, which 
produces a depth-one tree 7t with leaves. Although Tk is defined nondeterministically, 

it is always a perfectly balanced binary tree of depth 

Lemma 3. Any internal node of any Uk has exactly 2'^ children if k < t, and 
children ifk = t. 
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2.3 A Product Space Construction 

We define any Vk by means of a distribution T>k. We specify a lower bound on the 
probability that a random point set Pk drawn from T>k is active prior to round k, ie, 
belongs to Vk- 

• Distribution D\ : A random P\ is defined by picking a random index set I\ (more on this 
below) and, independently, a random bit vector o\ uniformly distributed in {0, Ji 
is defined recursively in terms of / 2 , . . . , /t- Each Ik is defined with respect to a certain 
tree Uk ■ Any node v in any Uk is naturally associated with an interval of integers between 
0 and 2 — 1 of size larger than any fixed constant (go back to the node u of 7i to which 
it corresponds to see why): call the smallest even integer in that interval the mark point 
of V. We define a random index set I\ by setting A: = 1 in the procedure below: 

- For k = t, a random Ik (within some Pk) is formed by the mark points of nV 
nodes selected at random, uniformly without replacement, among the leaves of the 
depth-one tree Uk- 

- For k < t,a random p (within some Pk) is defined in two stages: 

[1] For each j = 1,2, d — I, choose nodes of Uk of depth j at random, 
uniformly without replacement, among the nodes of depth j that are not des- 
cendants of nodes chosen at lower depth (< j). The {d — nodes selected 
are said to be picked by p- 

[2] For each node v picked by p, recursively choose a random within 7fc+i = 
Pk{v). The union of these {d — sets Ik+i forms a random p within Tfe. 

Note that a random Pi (drawn) from Pi is active with probability 1 since no in- 
formation has been exchanged yet between Bob and Alice. We see by induction that a 
random p consists of s = (d — integers. Setting k = 1, and using the 

fact that n = 2s — 1, we have the identity 



n = 2(d-l)*-im®*-l. 



( 2 ) 



• Distribution Dk- We enforce the following 

• POINT SET invariant: For any 1 < A: < f, a random Pk from Dk is active 

_ 2 

with probability at least 2 . 

By abuse of terminology, we say that Pk € Pfe if sampling from Dk produces Pk with 
nonzero probability. Once the probability of a point set is zero in some Dk, it remains 
so in all subsequent distributions Dj (j > k), or put differently, 

Pi D ••• DPt. 

Let Pi = (Ji, (Ti) be an input point set [pi^ , . . . , Xi^_^i^,pi^} in Pi. In the 

recursive construction of p, ifu is anode of Uk picked by p in step [1], let {P, . . . ,ih} 
be the p+i defined recursively within Tit+i = Pk{v). The set 
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is called the v-projection of Pi. Similarly, one may also refer to the n-projection of any 
Pj (j < which might be empty. Obviously, it is possible to speak of a random P|„ 
(with V fixed), independently of any Pi, as the point set formed by a random and a 
uniformly distributed random bit vector of size |/fe| — 1. It is this distribution that 
will be understood in any further reference to a random P|„. 

Assume that we have already defined Vk, for k < t. A distribution T>k is associated 
with a specific tree Tk- To define T>k+i, we must first choose a node v in Uk and make 
7fe+i = Tk{v) our reference tree for T>k+i- Any n-point set of T>k whose probability is 
not explicitly set below is assigned probability zero under Vk+i ■ Consider each possible 
point set P|„ in turn (for v fixed), and apply the following rule: 

- If P|„ is the u-projection of some Pk in Pk, then take one^ such Pk, and set its 
probability under Pk+i to be that of picking P\y randomly. 

- Otherwise, take one Pfe € Pfe whose u-projection is P|^,, and again set its probability 
under Vk+i to be that of pieking P|^, randomly. 

During that round k. Bob reduces the collection of active point sets in Vk+i to form 
Vk+i- To summarize, a random Pk is defined with reference to a specific tree Tk - Note 
that the distribution Pk is isomorphic to that of a random P|„, for fixed v € Uk-i, or 
equivalently, a random {Ik,crk), where ak is a uniformly distributed random bit vector 
of size \Ik \ — 1. 

2.4 Alice’s Query Lines 

As the game progresses, £i decreases in size to produce the nested sequence £i D 
• • • D £f Prior to round k, the currently active query set Ck is associated with the same 
reference tree Tfe used to define a random Pk- As we observed in the last section, each 
node of Uk corresponds to a unique interval of integers in [0,2*'). By abuse of notation, 
we also let Ck designate the set of integers j defining the lines y = 2jx — in the set. 
We maintain the following: 

• QUERY invariant: For any f <k <t, the fraction of the leaves in Uk whose 
intervals intersect Ck is at least 1/6. 

Lemma 4. If Ct and Vt satisfy their respective invariant, then Ct x Pt is unresolved. 

Proof. Suppose that Ct satisfies the query invariant and that Ct x Vt is not unresolved: 
we show that Vt must then violate the point set invariant. For each leaf of Ut whose 
interval intersects Ct, pick one ji € Ct in that interval. By Lemma 3 and the query 
invariant, this gives us a sequence ji < • • • < jm of length 

2^ 

m>—. (3) 

Given Pt € Vt, we define the spread of Pt, denoted spread(Pt), as the number of 
intervals of the form [jt, jt+i] (0 < i < m) that intersect the index set I {Pt) (Fig. 4); for 
consistency we write jo = 0 and jm+i = 2*' — 1. Suppose that the spread jS'j is defined 

^ It does not matter which one, but it has to be unique. 
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by some fixed set S of size less than w'^. Of the m + 1 candidate intervals [ji,ji+i], a 
random It must then avoid m + 1 — IS"! of them. Although such an interval may not 
always enclose a whole leaf interval, it does contain at least one mark point, and so the 
choice of It is confined to at most 2'^ — m — 1 + \S\ leaves of iff Thus, the probability 
that the spread is defined by S is bounded by 



\ ) 




< 




m — IS"! 
¥ 



Summing over all S”s of size less than it follows from (3) that 



Prob 



spread (Pt) < 




(4) 




Suppose now that the spread is at least Then 

Tt ~ "I Pil I ^ilt2 I Pi2 I I • • • ) Pia-lJ I Pis I’ 

includes a subset P* of at least — 1 points pt. , every one of which can be paired 
with a line y = 2kx — of £t, where ij < k < ij+i. Pick a random Pt from T>t, and 
let E denote the event: “all queries from Ct give the same answer yes/no with respect 
to point set Pt.” By Lemma 2, the Xf f^^’s are all of the form or all of the 

form (no mix). As we observed earlier, Pt is isomorphic to the distribution of a 

random (Jt, at), where at is a string of tu® — 1 bits (drawn uniformly, independently). 
The constraint on the X ’s reduces the choice of a random Pt by a factor of at least , 

and hence. 



Prob 



spread (Pt) > 



< 2 " 



(5) 
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Putting together (4,5), we find 



Prob[ H ] = Prob[ H 
+ Prob[,5’ 



< 2 -^^ + 2 



spread (Pt) < 
spread (Pt) > 




Prob[spread(Pt) < 
Prob[spread(Pt) > w^ \ 



which violates the point set invariant. □ 



During the /c-th round, Alice chooses an index in Bob’s table. As we discussed 
earlier, the set of possible choices partitions her current query set Ck into as many 
equivalence classes. An internal node v of Uk is called heavy if one (or more) of these 
classes intersects the intervals associated with a fraction at least 1/6 of the children of 
V. The following is a variant of a result of Ajtai [1]. 



Lemma 5. The union of the intervals associated with the heavy nodes of Uk contains 
at least a fraction 1/26 o/ the leaves’ intervals. 



Proof. Fix an equivalence class and color the nodes of Uk whose intervals intersect it. 
Mark every non-root colored node that is heavy with respect to the equivalence class. 
Then, mark every descendant in Uk of a marked node. Let N be the number of leaves 
in Uk and let Nj be the number of leaves of Uk whose depth-/ ancestor in Uk is colored 
and unmarked (we include v as one of its ancestors). For / > 1, an unmarked, colored, 
depth-/ node is the child of an unmarked, colored, depth-(/ — 1) node that is not heavy 
for the chosen class, and so Nj < A/_i/6. We have N\ < N and, for any / > 0, 



Repeating this argument for all the other equivalence classes, we find that all the 
unmarked, colored nodes (at a fixed depth / > 0) are ancestors of at most 
leaves. This implies that the number of unmarked, colored leaves is at most < 

N/2b. (This follows from (1, 2).) The query invariant guarantees that at least N/b leaves 
of Uk are colored and so at least N/2b are both colored and marked. It follows that the 
marked nodes whose parents are unmarked are themselves are ancestors of at least N/2b 
leaves: all these nodes are heavy. □ 



Alice’s strategy is to keep her active query sets as “entangled” as possible with Bob’s 
point sets. Put differently, ideally the two should form a low-discrepancy set system [6] 
(at least in the one-way sense). The next result says that this is true on at least one level 
of Uk, where many heavy nodes end up being picked by a random Ik. 

Lemma 6. For any 0 < k < t, there is a depth j (0 < / < d) such that, with probability 

2-1 o 

at least 2^“’ ^ , a random Pk from T>k 1^ active and its index set Ik picks at least 
heavy depth- j nodes in its associated Uk. 
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Proof. Recall that T>k is isomorphic to a random (/fc, cr^.). Fix once and for all. The 
heavy nodes ofUk are ancestors of at least a fraction 1/26 of the leaves (Lemma 5). It 
follows that, for some 0 < j < d, at least a fraction 1 / 2bd of the nodes of depth j are 
heavy. Among these, may pick only those that are not picked further up in the tree: 
this caveat rules out fewer than dw^ candidate nodes, which by Lemma 3, represents a 
fraction at most dw^/2‘^ of all the nodes of depth j. So, it appears that among the set of 
depth-/ nodes that may be picked by Ik, the fraction a of heavy ones satisfies 

The index set Ik picks depth-/ nodes of 64 at random with no replacement. By 
Hoeffding’s classical bounds [10], the probability that the number of heavy ones picked 
exceeds the lemma’s target of is at least 

I _ > 1 — 2 ^“’^. 

It follows from the point set invariant and the independence of Ik and that, with 

2 3 

probability at least 2^“” — 2^“’ , a random Pk is active and its index set Ik picks at 
least heavy depth-/ nodes in its associated Uk- □ 



2.5 Probability Amplification 

During the /c-th round. Bob sends to Alice the contents of the cell T[fk{i, T[/i(f)], ...)]. 
The 2“” possible values partition the current collection Pk of active point sets into as 
many equivalence classes. We exploit the product nature of the distribution Pk to amplify 
the probability of being active by projecting the distribution on one of its factors. 

Lemma 7. For any 0 < k < t, there exists a heavy node v of Uk such that, with 
probability at least 1/2, a random Pk+i drawn from the distribution Pk+i associated 
with Tk+i = Tk{v) belongs to Pk- 

Proof. We refer to the depth / in Lemma 6. Let p\s denote the conditional probability 
that a random Pk from Pk belongs to Pk, given that S is exactly the set of heavy nodes 
of depth / picked by Ik. Summing over all subsets S of heavy depth-/ nodes of size at 
least 6^, 

Prob [S = set of heavy depth- / nodes picked by 4 ] ■ p^s 

s 

is the sum, over all S, of the probability that Pk € Pk and that S is precisely the set 
of heavy nodes of depth / picked by its index set p. By Lemma 6, this sum is at least 
2 ~w - 1 ^ therefore p^s* > 2““” for some set S* of at least heavy nodes of 
depth/. 

Because a random Pk whose p picks v consists of a random {Ik+i , o^k+i) drawn at 
node V independently of the rest of (4 , Cfc ) , its u-projection has a distribution isomorphic 
to that of (Tfc+i, CTfe+i), which is also Pk+i- The same is true even if the distribution on 
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Pk is conditioned upon having S as the set of heavy depth-j nodes picked by Ik- If Pk 
belongs to Pk then its n-projection maps to a unique set Pk+i € Vk+i also in Pk- 
Let denote the probability that a random Pk+l drawn from the distribution Pk+i 
associated with Pk+i = Tk{v) belongs to Pk - It follows that 

p\s* < n 

v€S* 



Since \S*\ > w^, it follows that 



P\v > 




1/|S*| 1 

> - 



for some v e S* - O 

Both query and point set invariants are trivially satisfied before round 1 . Assume now 
that they hold at the opening of round k < t- Let v denote the node of in Lemma 7. 
The possible ways of indexing into the table T partition Alice’s query set Ck into as 
many equivalence classes. Because v is heavy, the intervals associated with a fraction 
at least 1/6 of its children intersect a particular equivalence class. Alice chooses such 
a class and the query lines in it as her new query set Ck+i- The tree Uk+i is naturally 
derived from 7fe+i = Tk{v), and the query invariant is satisfied at the beginning of 
round k + 1- 

Upon receiving the index from Aliee, Bob must choose the eontents of the table entry 
while staying eonsistent with past choices. By Lemma 7, a random Pk+i from Pk+i 
(distribution associated with Tk+i) is active at the beginning of round k with probability 
at least a half. There are 2“’ choices for the table entry, and so for at least one of them, 
with probability at least (1/2)2^“’ > 2^^ , a random point set from Pk+i is active at 
the beginning of round k and produces a table with that specific entry value. These point 
sets constitute the newly active collection Pk+i, and the point set invariant still holds at 
the beginning of round k + 1- 

To show that t rounds are needed, we must prove that Ck x Pk is unresolved, for 
any k <t-ln fact, because of the nesting structure of these products, it suffices to show 
that Ct X Pt is unresolved, which follows from Lemma 4. This proves the lower bound 
of Theorem 1. □ 
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Abstract. We consider the problem of computing the diameter of a set of n points 
in d-dimensional Euclidean space under Euclidean distance function. We describe 
an algorithm that in time 0(dn log n-fn^) finds with high probability an arbitrarily 
close approximation of the diameter. For large values of d the complexity bound 
of our algorithm is a substantial improvement over the complexity bounds of 
previously known exact algorithms. Computing and approximating the diameter 
are fundamental primitives in high dimensional computational geometry and find 
practical application, for example, in clustering operations for image databases. 



1 Introduction 

We consider the following problem: given a set of n points in d-dimensional Euelidean 
space, compute the maximum pairwise Euclidean distanee between two points in the 
set. This problem is known as the diameter problem or the furthest pair problem. There 
are several effieient algorithms for the ease when d = 2 [26] and d = 3 [2,27,6,5,23] 
which, however, do not extend to higher dimensional spaces. The diameter problem is 
one of the basic problems in high dimensional computational geometry [15,16,17]. * In 
this paper we eonsider a setting in which the number of points n and the dimension of 
the space d are equally important in the complexity analysis. 

The exact solution to the diameter problem in arbitrary dimension can be found 
using the trivial algorithm that generates all possible 0{n^) inter-point distanees and 
determines the maximum value. This algorithm runs in time 0{dvf ). As noted in [26] 
substantial improvements of the asymptotic complexity in terms of n and d must over- 
eome the faet that for d > 4 the number of diametral pairs of points ean be [12],^ 

while computing a single inter-point distance takes time 17(d). Yao in [33] gives an algo- 
rithm to compute the diameter in time log^^“^“*^ n) where a(d) = 

for d > 3 and fixed. The teehnique in [33] ean be extended to an o{rf) algorithm in 
non-fixed dimension for d < V 2 log log n. 

* Gritzmann and Klee have proposed the term Computational Convexity to denote the study of 
combinatorial and algorithmic aspects of polytopes in high dimensional spaces. 

^ Instead, in dimensions 2 and 3, the number of diametral pairs of points is 0{n) [11,28]. 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 366-377, 1999. 

(g) Springer- Verlag Berlin Heidelberg 1999 
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A result ofYao [3 2] (cited in [3 3 ]) shows that all inter-point distances can be computed 
in time 0{M (n, d) + nd + n^), where M (n, d) is the time to multiply an n x d matrix 
by a d X n matrix. Using the asymptotically fastest known square matrix multiplication 
algorithm [7] we have M(n, d) < 0(min{nd®^^, dn'*^^}), where s « 2.376. The 
furthest pair is then found trivially with 0{n^) extra time. 

In order to obtain faster algorithms for large values of d, we relax the requirement 
by considering algorithms that approximate the diameter up to multiplicative factors 
arbitrarily close to one. The main result in this paper (Theorem 12) is the following: the 
diameter of a point-set of size n in dimension d is approximated in time 0{dn log n + ) 
within a factor 1 -f e, for any real value e > 0, with probability 1 — d. The constants hidden 
in the big-Oh notation depend on user controlled parameters e and S, but not on d. Our 
result matches or improves asymptotically the trivial exact algorithm in the range d > 4. 
Our algorithm matches asymptotically Yao’s algorithms for d in the range: 1/2 log log n < 
d < and attains better performance for d > 

Another approximation algorithm in literature, [10], attains a fixed approximation 
factor c = Vo — 2 ^/ 3 , which is not arbitrarily close to one. 

Subsequent to the acceptance of this paper we got news of a forthcoming paper 
of Borodin, Ostrowsky and Rabani [4] where, amongst other important results, the 
approximate furthest pair problem is solved in time roughly 0(nd log with 

high probability. Their result improves asymptotically over our bound for d < n/ log n, 
however, for reasonable values of e, say e = 0.1, the asymptotic improvement is of the 
order of and such gain should be weighted against a more complex coding effort. 

1.1 Applications 

Among the applications of computing (or approximating) the diameter in high dimen- 
sional spaces we mention those in data clustering for images database [18]. An image 
is mapped into a point in an high dimensional space by associating a dimension to each 
term of a wavelet expansion of the image considered as a two-dimensional piecewise 
constant function [13,19]. Thus similarity clustering of images can be translated in the 
problem of determining clusters of such points. One of the most natural measure of qua- 
lity for a cluster is its diameter. Note that in such applications the number of dimensions 
is very high thus the complexity due to the dimension must be taken into consideration 
in the design of efficient algorithms. 

1.2 The Algorithm 

Our algorithm has been inspired by a recent technique for nearest neighbor search descri- 
bed by Kleinberg [21]. Although the formulation of the closest pair problem seems not 
too different from that of the diameter (searching for the smallest inter-point distance 
instead of the maximum), the mathematical properties of the two problems are quite 
different so that in general not any efficient algorithm for the closest pair problem yields 
an efficient one for the farthest pair problem. 

Intuitively, the method of Kleinberg is based on the idea that if a vector x € is 
longer than a vector y € 1R“* then this relation is preserved with probability greater than 
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1/2 in a projection of x and y over a random line. Thus using several projections of the 
same set of vectors and a majority voting scheme we can retrieve, with probability close 
to one, the “actual” relative length relation between x and y. The theory of range spaces 
of bounded VC-dimension is invoked to determine how many random lines are needed 
to satisfy the constraints imposed by the error parameter e and the confidence parameter 
(i. 

1.3 Organization of the Paper 

In Section 2 we review some basic properties of projections of random vectors and 
of range spaces with bounded VC-dimension. In Section 3 we give the preprocessing 
and query algorithms for furthest point queries. Finally in Section 4 we apply the data 
structures of Section 3 to the problem of determining the diameter, thus establishing the 
main result. 



2 Basic Results 



We denote with the set of directions in 1R“*; it can be identified with the set of unit 
vectors, or with the set of points on the unit sphere. For any two vectors x,y £ IR“*, we 
define the set of directions over which the length of the projection of x is longer or equal 
than that of y. 

Zx;y = { U € : |u • a:| > |u • y| } . 

We denote with Zx-,y the relative measure of Zx-,y with respect to the entire set of di- 
rections; it represents the probability that the projection of a; on a random direction is 
longer than the projection of y: 

Zx-,y = = Pr [\v • a;| > |u • y\] . 

Note that for a: 7 ^ y the probability that |u • a;| = |u • y| is null. 

The idea of this work is that if x is ‘significantly’ longer than y, then the set Zx-,y is 
large, and it is very probable that a random direction belongs to it. The following lemma 
captures this idea: 

Lemma 1. If {I — 7 )||a;|| > ||y||, with 0 < 7 < 1, then Zx-y > V 2 + 7/"^- 

Proof Sketch.. Consider the plane A containing the vectors x,y, and the projection 
7t_4(ii) of V on this plane. Observe that v ■ x = 7r^(u) • x and v ■ y = 7r^(u) • y. So 
the relation v € Zx-,y depends only on 7 t^(u), or equivalently on the angle between 
7 T^(u) and the vector x (or y). 

Note that (p is uniformly distributed in [0, 27t). Let 0 be the angle formed by x and 
y: we find that v e Zx-,y whenever cos‘^{0 — f) < cos^ (p ■ ||a;|P/||y|p, and the quantity 
Zx-,y is minimum when x and y are orthogonal. In this case 



2 llxll 2 

Zx-v = — tan -2— a > — tan 

7T ||y|| 7T 



1 -7 ■ 
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In order to bound to the latter quantity, we consider that tan ^(1 — 7) ^ is a convex 
function, and its derivative in 7 = 0 is 1 / 2 . So we obtain 

-1 1 1 7T 1 

from which the lemma follows. □ 

Intuitively, we want to fix a set V of directions, and compare the lengths of vectors 
by comparing the lengths of their respective projections over elements of V, and making 
a majority vote. For two vectors x, y we will write that 



x>vy ^ \vr\Z^,y\>y2\v\. ( 1 ) 

This means that the projection of x is longer than the projection of y over at least half 
of the vectors in V . Note that xt>v y and yt>vx can simultaneously hold with respect 
to the same set V . 

For a fixed set of directions V, consider two arbitrary vectors x and y such that x 
is significantly longer than y. Lemma 1 says that the set Z^.y is large, so we can hope 
that in V there are enough vectors of Zx-,y so that x >v V- However, we want this to hold 
for any vectors x, y, with respect to the same (fixed) set V. For this reason we cannot 
use directly Lemma 1, but we will use some VC-dimension techniques. For a detailed 
treatment of this topic see [31,1] or the seminal paper [29]. Here we will present only 
the definitions needed. 

A range space is a pair {V,TZ), where V = /x) is a probability space, and 

7?, C JF is a collection of measurable (w.r.t. the measure /x) subsets of J7. A finite set 
A C J? is said to be shattered by TZ if every subset of A can be expressed in the form 
A n i? for some R £ TZ. The VC-dimension VC-dim{TZ) of the range space (P, TZ) is the 
maximum size of a set that can be shattered by TZ (hence no set of size > VC-dim{TZ) 
can be shattered). 

There is a natural identification between collections of sets and families of binary- 
valued function: to each subset R eTZ corresponds its indicator function fn{x) : J? — ^ 
{0, 1} such that fR{x) = 1 x £ R. This identification is useful to express 

combinations of range spaces in the following manner. 

For an integer /c > 2, let u : {0, 1}^ — ^ {0, 1} and ,fi, - , fk ■ TZ ^ {0, 1} be 

binary- valued functions (u represents the combination operator and the fi ’s are indicator 
functions). Define u(fi , . . . , fk) ■ TZ ^ [0, 1} as the binary-valued function x 1 — ^ 
u{fi{x), . . . , fk{x)). Finally, if A\, ... , Afe are families of binary- valued functions, 
we define W(Ai, ... , Afc) to be the family of binary- valued functions { u(/i, ... , fk) ■ 
fi € Ai Vi }. In this way we can obtain A (B B = [A U B : A <E A, B £ B} hy 
choosing u{a, b) = max{a, b}, and AGB={AriB:A£A, B£B}hy choosing 
u{a, b) = a ■ b. 

We will use the following theorem by Vidyasagar [30, Th. 4.3], which improves on 
a result by Dudley [8,9]: 

Theorem 2. If VC-dim{Ai) is finite for each i, thenU = U{Ai, . . . ,Ak) also has finite 
VC-dimension, and VC-dim{lf) < ak ■ d, where d = max^ VC-dim{Ai), and ak is the 
smallest integer such that k < ak/ log 2 (eo;fe). 
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We consider the probability space V made by the set of directions with the 

uniform distribution, where )j,{X ) = \X\/\ S‘^~ ^ | ; we take as TZ the collection of all the 
sets Zx-,y The following lemma bounds the VC-dimension of this range space: 

Lemma 3. The VC-dimension of the range space (V, TZ) defined above is strictly less 
than 25{d + 1). 

Proof. For a vector u G 1R“*, let Hu denote the closed hemisphere { u G : u - v > 
0 }, and let Ti be the collection of all such hemispheres. Then any set Zx-,y can be 
expressed as the Boolean combination {H^-y ft Hx+y) U {H^^-y H Thus we 

have TZ CU, where U = (Ti Q Ti) (B {Ti (B TT.) • Ty theorem by R.adon [^3J implies that 
VC-dim{Ti) < d + 1. We then apply Theorem 2, with u(a, b, c, d) = max{a • 6, c • d}, 
and the fact that 0:4 = 25 to obtain that VC-dim{TZ) < VC-dim(U) < 25(d +1). □ 

A subset A of J7 is called a 7-approximation (or 7-sample) for a range space (P, TZ) 
is for every i? G 7?. it holds 



\RnA\ 

1^1 



fi{R) 



< 7 - 



( 2 ) 



This means that A can be used to obtain a good estimate of the measure of any set 
R £ TZ. The main result, which follows from Lemma 3 and from a fundamental theorem 
by Vapnik and Chervonenkis, is the following. 

Lemma 4. With probability at least 1 — 6 a set V of cardinality 

^ (^25(d + 1) log =<^(^108^) 

of vectors chosen uniformly at random from is a j-approximation for the range 

space defined above. 

The lemma above permits to make comparisons in the following way: choose a set 
V of /(e/rr, 5) random directions. With probability 1 — d it is a (e/tt) -approximation. 
For any x, y with (1 — e)||a;|| > ||y||, we have that pl^Z^.y) > 1/2 + ej'x, by Lemma 1. 
From the definition of 7-approximation it follows that \Zx-y ft 17| > 1/2 1 17 1, so a; >1/ y 
by definition (1). This is the idea which we are going to use in the following section. 
Flowever, in order to save computations, we will make comparisons using random subsets 
of the fixed set V . 



3 An Algorithm for Farthest-Point Queries 

We will present in this section a (e, d) -approximation scheme for computing the farthest 
site of a query point q. This means that with probability 1 — d the algorithm gives an 
answer which is within a factor 1 — e from the optimal one. The parameters e and 6 are 
chosen by the user before the algorithm starts. 
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Let P = {pi , ... ,Pn} the set of given sites, and q € the query point. For 
simplicity we will assume that n is a power of 2. Let p* be the farthest-site from q, i.e., 
the point such that 

d{p*,q) = ma.xd{pi,q) , 

Pi€P 

where d{p, q) = ||p — g|| is the standard Eucledian metric in 1R“*. Let be the set of 
sites that are far from q ‘almost’ like p* : 

Ze = {pieP : d{pi,q) > (1 - e)d{p*,q) } . 

The purpose of the algorithm is to give, with probability at least 1 — S, an element of Z^. 

3.1 Building the Data Structure 

The preprocessing stage is the following. Let sq = log(l + e)/logn. We choose ran- 
domly L vectors from S^~^, where L = /(eo/12,(i) = ©(dlogdlog^nloglogn). 
These vectors can be obtained for example by the method described by Knuth [22, 
p. 130]. Let V = {ui, . . . ,vl} the set of L directions generated above. The data struc- 
ture is a matrix M, of dimension L x n, where M[i,j] = Vi ■ pj. 

3.2 Processing a Query 

We first define the following relation between sites of P: for a set of directions P Q V, 
we say that Pi^r Pj if the projection of Pi is farther than the projection of Pj from the 
projection of q, for at least half of the directions in P. Formally: 

Pi hr Pj {pi - q) i>r {pj - q) ■ 

We will use this relation to build a ‘tournament’ between sites, by making comparisons 
with respect to a fixed set P: if pi hr Pj and pj hr Pi, the winner of the comparison is 
Pi, if Pi hr Pj and pj hr Pi, the winner is pj] finally, if both Pi hr Pj and pj hr Pi, 
the winner is chosen arbitrarily. Note that the above description is complete, i.e., it cannot 
occur that Pi hr Pj and pj hr Pi- 

Phase A. We first extract a random subset P C V of cardinality ci log^ n (the value 
of Cl will be given in Lemma 8). We make extractions with replacement, so we permit 
P to be a multiset. Let b = log |P| = ©(log log n). We assume for simplicity that b has 
an integer value. 

We compute the values u • g for all u € P. Then we build a complete binary tree T 
of depth log n. We associate randomly the sites in P to the leafs of T. To every internal 
node X at height 1 < h < 6, we associate a random subset P^ C P of size C2 + C2h 
(the appropriate values for C 2 and c'2 will be given in Lemma 9). To higher nodes, with 
height h > b, we associate the entire set P. 

Now we make a tournament between the sites, proceeding from the leafs towards the 
root of T: to each internal node x we associate the winner of the comparison between 
the sites associated to its children, with respect to the set P^. Let p^ be the winner of 
this tournament, i.e., the site which at the end of Phase A will be associated to the root 
ofT. 
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Phase B. Independently from all the above, we randomly choose a subset Pq P of 
size C3 log^ n, where C3 will be defined in Lemma 7 . We compute the distances between 
q and the sites in Pq. Let ps be the site of Pq farthest from q. 

The algorithm finishes returning the site, between and ps, which is farthest from 

q- 

3.3 Correctness 

We now prove that the algorithm correctly computes an element of Z^. First of all, 
observe that, as a consequence of Lemma 4 , the set V created during the preprocessing 
is an (eo/12) -approximation of the range space {P,TZ), with probability 1 — b. In the 
following, we will assume that this event occurred. 

Lemma 5. Let V be an {eo/ 12 )-approximation, and v a random element ofV. Let 
q G IR“*, Pi,Pj G P, and suppose that (1 — j)d{pi,q) > d{pj,q), with £o < 7 < V 2 - 
The probability that |u ■ {pi — q)\ > |u • {pj — q) \ is at least 1/2 + t/5. 

Proof. Let x = pi — q and y = pj — q. The measure Zx-,y of Zx-,y is at least 1/2 + 7 /tt, 
by Lemma 1 . As V is an (eo/12) -approximation, the probability that an element of 
V belongs to Zx-^y is at most eo/12 less than that value. So this probability is at least 
Zx-,y — eo/12 > 1/2 + 7/5, and the lemma follows. □ 

Lemma 6. Suppose (1 — y)d{pi, q) > d{pj,q), and let P' be a random subset of P, of 
eardinality k. The probability that pj Zp, p^ is at most 

Proof Sketch.. Let x = pi — q and y = pj — q. The relation pj Zp, pi is equivalent to 
y X, that means \P' n Zy-x\ > V2A:. As \ Zy-x Q Zx-y, we need only to bound 
the probability that |P' n Zx-y \ < V2A:. Define k Poisson random variables Xi , . . . , X^, 
where Xr is 1 if the r-th vector in P' belongs to Zx-,y, and 0 otherwise. From Lemma 
5 it follows that the probability of success is Pr \Xr = 1] > V2 + 7/5. Defining the 
random variable A = = |P^ fl the event we are interested in is A < 1/2/0. 

The lemma follows by some calculations and by application of the Chemoff bound (see 
e.g. [ 24 ]). □ 

The motivation for Phase B is that if in the set Zg there are too many sites, one of them 
could eliminate p* during the early stage of the tournament, and then be eliminated by 
some site not in . On the other side, if Z^ is sufficiently small, there is a high probability 
that p* will reach at least level b of the tournament tree; in this case, we can show that 
the winner p^ is an element of Z^. Intuitively, at each comparison the winner can be 
slightly closer to q than the loser: if p* is eliminated too early in the competition, all this 
small errors can take us too close to q; if instead p* reaches at least level b, the final error 
is small with high probability. 

We begin by proving that if Z^ is large, then Phase B succeeds in finding an appro- 
ximate farthest-point of q with high probability: 

Lemma 7. If\Z^\ > 7 in/log^n, then with probability at least 1 — 6 it holds that 
PB G Zi;. 
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Proof. When we randomly seleet the elements of Pq from P, the probability of taking 
an element not in is less than 1 — 71 / log^ n. The probability that no element of Pq 
belongs to is less than that quantity raised to power |Po| = C3 log^ n. We can choose 
C3 = 03(71, d, n) such that (1 — 71/ log^ ” < d. If we choose such a 03, with 

probability at least 1 — d there is at least one element of Zg in Pq, so ps certainly belongs 
to □ 

Let us see now what happens if Zg is small. Define the constant 72 = 72(e) such 
that > 1 — e. We denote by the following event: 

= I 3 pi,pj e P, ifj: Pj hr Pi, (^1 - d{pi,q) > d{pj,q) | , 

that is, the event that there exist two sites, with distances significantly different from 
q, for which the comparison based on projections on the vectors of P gives the wrong 
answer. We can give a bound to the probability that this event occurs: 

Lemma 8. The probability of event £\ is less or equal to 6 / 3 . 

Proof. There are n(n — 1 ) < rf ordered pairs of sites to consider: for each pair we apply 
Lemma 6, with 7 = 72 / log n and A: = | P | = Ci log^ n, to bound the probability of error. 
The probability of Pi is bounded by the sum of these probabilities. The lemma follows by 
defining suitably the value of Ci = Ci (72 , d, n) in such a way that rf ■ ”/36 < 

d/ 3 . □ 

Let us denote by £2 the event “p* does not reach level b in the tournament tree”, that 
is, it is not assigned, during the tournament in Phase A, to any node at level b. 

Lemma 9 . < 7in/log^n, the probability of £2 is less or equal to 2 d/ 3 . 

Proof. Consider the leaf of T to which p* was assigned, the node x at level b along the 
path from this leaf to the root {x is the node to which p* should be assigned, if £2 does 
not occur), and the subtree rooted at x. 

We first of all exclude the possibility that in T® there be sites of Z^ other than p* . 
The number of leafs in T® is 2 ^ = |P| = ci log^ n, so the probability that |T®n^e| = 1 
is 

n-lZehf n 
2 >= -1 

By defining suitably 71 = 71 (ci , d, n) we can make this probability greater than 1 — d/ 3 . 
So we can assume that p* is the only site of Z^ in the subtree T*. We now show that in 
this case it reaches level b with probability 1 — d/ 3 . 

If p* reached level h— 1 , it is compared, at level h, with an element not in Z^, made 
using jP^,! = C2 + C2I1 vectors of P. From Lemma 6, with 7 = e, it follows that the 
probability that p* loses the comparison is bounded by 




g-e^(c2+C2/i)/36 _ c<2 1'irs-^h 
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We can define the constants C2 = C2(e, S) and C2 = C2(e) such that the left factor is 
less than 6 /3, and the right factor is less than 2“^. 

Summing these quantities for h = 1, .. . , 5 we finally obtain that the probability that 
p* is eliminated is bounded by d/3. Together with the fact that the probability that more 
than one site of belongs to is less than 6 /3, this implies the lemma. □ 

We can now prove that if is small, Phase A succeeds in finding a good answer: 

Lemma 10. If\Ze\ < 7in/log^n, then with probability at least 1 — 6 it holds that 
Pa ^ 

Proof. With probability 1 — d neither £i nor £2 occur. In this case we show by induction 
on h > b that there exists a site P(h)^ assigned to a node at level h, such that 

d{p\q) , (3) 

and this implies that p^ £ Z^. 

The fact that £2 does not occur guarantees that p* reached level b, so we can start 
the induction, for h = b, with = p* . Suppose now that (3) holds at level h, and 
let be the adversary of p(/j). If d{p'^^f^yq) > (1 — 72/ logn)d(p(/i), q), then both 
P(/i) and satisfy (3) for level h + 1, so relation (3) holds in any case. If instead 
d{p[h)’d) < (1 ~ 72/ logn)d(p(/i), q), as we are assuming that £i does not occur, 
certainly wins the comparison, so P(h+i) = P{h) and relation (3) is valid for level 
h + 1. 

From the inductive argument it follows that (3) holds for h = log n, where P{\og n) = 
Pa is the site assigned to the root of T. Being 6 > 1, from the fact that (1 — > 

and from the definition of 72, we obtain 

( \ logn— 1 

d{p*,q)>e^'^^d{p*,q)>{l-e)d{p*,q) , 

that is what we wanted to prove. □ 

The correctness of the algorithm follows from both Lemma 7 and Lemma 10. 

3.4 Complexity 

The preprocessing consists in the computation of a distance, which requires time 0{d), 
for each element of the matrix M. The time needed is thus 0{dLn) = 0{d^n log dlog^ n) 
The space required to store M is 0{Ln) = 0(dn log dlog^ n). 

To answer a query, we first compute u • q for u G F. The time required for this is 
OdCl • d) = O(dlog^n). Each comparison of type pi pj requires time 0(|r!j,|), 
because all values v ■ pi are already stored in M. The number of nodes at level h is n2^^, 
so the total number of operations at levels 1 < h < 6 is 

b 

y~^(c2 + C 2 h)n 2 ^^ = C 2 n 2^^ + C2U < 2(c2 + C2)n = 0{n) . 

h=0 
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For levels h > b each comparison is made using the entire set F, where \F\ = 2^. We 
have 

log n log n log n 

\F\-n2-'^= ^ \r\ ■ n2^-^2-^ = n ^ 2^-^ <n . 




h=h-\-l 



h=h-\-l 



h=h-\-l 



We thus obtain that the total cost of Phase A is 0(n + dlog^ n). Phase B requires time 
0(dlog^ n). We conclude that the time required to answer a query is 0{n + dlog^ n). 
We summarize the properties of the farthest-point algorithm in the following theorem: 



Theorem 11. With probability 1 — (i thesetV created in the preprocessing is an (eo/12)- 
approximation. In this case, the probability that the algorithm returns an element of Zg, 
is, for each query, at least 1 — 5. The eomplexity of the algorithm is O {cP log d ■ n log^ n) 
for the preprocessing and 0(n + dlog^ n) for answering a query. 



4 Computing the Diameter of a Point Set 

We now show how to use the algorithm for farthest-point queries to find the diameter 
dp of a set of points P = {pi, . . . ,Pn\ in 1R“*. 

First of all, we use a dimension-reduction technique, by projecting all the points on 
a random subspace of dimension k = 0(e^^ logn). Let P' the resulting set of points 
in and dpi its diameter. The Johnson-Lindenstrauss Lemma (see [20,14]) affirms 
that with high probability all inter-point distances are preserved, so that (1 + e/2) dp > 
dpi > (1 — s/2)dp. 

Next, we compute the set V and the matrix M as in the preprocessing stage of the 
farthest-point algorithm, using e/2 as approximation parameter. 

For each p' € P' , we perform a farthest-point query where q = p'l and the sites are 
the remaining points in P'; let Fe{p'i) the point returned by the algorithm described in 
Section 3. We compute all the distances d{p'^, F^{p'f}) . Let dpi the maximum distance 
computed in this way. Our main result is the following: 

Theorem 12. Let dp be the diameter of the point set P, and dpi the value computed by 
the algorithm above. Then, with probability 1 — d, it holds that (1 + e/2) dp > dpi > 
(1 — e)dp. The complexity of the algorithm is 0{nd log n + n^). 

Proof. Let {p* ,q*) be a pair of points in P' such that d{p*,q*) = dpi. When we 
make the farthest-point query with q = p* , the point Fg{p*) is, with probability 1 — d, 
such that d{p* , F^{p*)) > (1 — e/ 2) dp/. The value dpi computed in the last step is 
dpi > (1 — e/2)dpi > (1 — ejTdfdp > (1 — e)dp. Moreover, the distances computed 
by this algorithm are certainly less than dp/. 

The time to perform the projection at the first step is 0{nd log n). Then we perform 
the preprocessing and n queries of the algorithm in Section 3, in dimension O(logn). 
As a consequence of Theorem 1 1 , we obtain that the overall complexity of the algorithm 
is 0(nd logn + n^). □ 
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A Nearly Linear-Time Approximation Scheme for the 
Euclidean fe-median Problem 
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Abstract. In the k-median problem we are given a set S' of n points in a metric 
space and a positive integer k. The objective is to locate k medians among the 
points so that the sum of the distances from each point in S to its closest median 
is minimized. The fc-median problem is a well-studied, NP-hard, basic clustering 
problem which is closely related to facility location. We examine the version of the 
problem in Euclidean space. Obtaining approximations of good quality had long 
been an elusive goal and only recently Arora, Raghavan and Rao gave a randomized 
polynomial-time approximation scheme for the Euclidean plane by extending 
techniques introduced originally by Arora for Euclidean TSP. For any fixed e > 0, 
their algorithm outputs a (1 T- e) -approximation in log n) time. 

In this paper we provide a randomized approximation scheme for points in d- 
dimensional Euclidean space, with running time 0(2^^^ n log n log fc), which is 
nearly linear for any fixed e and d. Our algorithm provides the first polynomial- 
time approximation scheme for fe-median instances in d-dimensional Euclidean 
space for any fixed d > 2. To obtain the drastic running time improvement we 
develop a structure theorem to describe hierarchical decomposition of solutions. 
The theorem is based on a novel adaptive decomposition scheme, which guesses 
at every level of the hierarchy the structure of the optimal solution and modifies 
accordingly the parameters of the decomposition. We believe that our methodology 
is of independent interest and can find applications to further geometric problems. 



1 Introduction 



In the k-median problem we are given a set S' of n points in a metric space and a positive 
integer k. The objective is to locate k medians (facilities) among the points so that the 
sum of the distances from each point in S to its closest median is minimized. The k- 
median problem is a well-studied, NP-hard problem which falls into the general class 
of elustering problems: partition a set of points into clusters so that the points within 
a cluster are close to each other with respect to some appropriate measure. Moreover 
/c -median is closely related to uncapacitated faeility location, a basic problem in the 
operations research literature (see e.g. [9]). In the latter problem except for the set S of 
points we are given also a cost Ci for opening a facility at point i. The objective is to 
open an unspecified number of facilities at a subset of S so as to minimize the sum of 
the cost to open the facilities {facility cost) plus the cost of assigning each point to the 
nearest open facility {service cost). 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 378-389, 1999. 
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1.1 Previous Work 

The succession of results for /c-median is as follows. Lin and Vitter [15] used their 
filtering technique to obtain a solution of cost at most (1 +e) times the optimum but using 
(1 + l/e)(lnn + l)k medians. They later refined their technique to obtain a solution 
of cost 2(1 + e) while using at most (1 + l/e)k medians [14]. The first non-trivial 
approximation algorithm that achieves feasibility as well, i.e. uses k medians, combines 
the powerful randomized algorithm by Bartal for approximation of metric spaces by 
trees [4,5] with an approximation algorithm by Hochbaum for /c -median on trees [13]. 
The ratio thus achieved is 0(log n log log n) . This algorithm was subsequently refined 
and derandomized by Charikar et al. [6] to obfain a guarantee of 0(log /clog log /c). Only 
very recently, Charikar and Guha and independently Tardos and and Shmoys reported the 
first constant- factor approximations [7]. In contrast, the uncapacitated facility location 
problem, in which there is no a priori constraint on the number of facilities, seems to be 
better understood. Shmoys, Tardos andAardal [18] gave a 3.16 approximation algorithm. 
This was later improved by Guha and Khuller [12] to 2.408 and more recently to 1.736 
by Chudak [8]. 

For the problem of interest in this paper, i.e. /c -median on the Euclidean plane, a 
randomized polynomial-time approximation scheme was given by Arora, Raghavan and 
Rao in [3]. For any fixed e > 0, their algorithm outputs a (1 + e) -approximation with 
probability 1 — o(l) and runs in 0{nknP^^/’^^ log n) time, worst case. This development 
followed the breakthrough approximation scheme of Arora [1,2] for the Traveling Sa- 
lesman Problem and other geometric problems. While the work in [3] used techniques 
from the TSP approximation scheme, the different structure of the optimal solutions for 
/c -median and TSP necessitated the development of a new structure theorem to hierar- 
chically decompose solutions. We elaborate further on this issue during the exposition 
of our results in the next paragraph. The dependence of the running time achieved by 
the methods of Arora, Raghavan and Rao on 1/e is particularly high. For example, 
the approximation scheme can be extended to higher-dimension instances but runs in 
quasi-polynomial time ) for a set of points in for fixed d > 2. 



1.2 Results and Techniques 

Results. We provide a randomized approximation scheme which, for any fixed e > 0, 
outpufs in expectation a (1 + e) approximately optimal solution, in time 
0(2^/^ nlognlog/c) worst case. This time bound represents a drastic improvement 
on the result in [3]. For any fixed accuracy e desired, the dependence of the running 
time on 1/e translates to a (large) constant hidden in the near-linear asymptotic bound 
0{n log n log k) compared to the exponent of a term polynomial in n in the bound of 
Arora, Raghavan and Rao. Moreover, for inputs in our algorithm extends to yield a 
running time of 0(2^/^ n log n log k), which yields for the first time a polynomial-time 
approximation scheme for fixed d > 2. The ideas behind the new /c -median algorithm 
yield also improved, nearly linear-time, approximation schemes for uncapacitated faci- 
lity location. We now elaborate on the techniques we use to obtain our results. 
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Techniques. Perhaps the most important contribution of our paper lies in the new ideas 
we introduce to overcome the limitations of the approach employed by Arora, Raghavan 
and Rao in [3]. To introduce these ideas we sketch first some previous developments, 
starting with the breakthrough results in [ 1 ,2] (see also [ 1 6] for a different approximation 
scheme for Euclidean TSP). 

A basic building block for Arora’s [1,2] results on TSP was a structure theorem 
providing insight into how much the cost of an optimal tour could be affected in the 
following situation. Roughly speaking, the plane is recursively dissected into a collection 
of rectangles of geometrically decreasing area, represented by a quadtree data structure. 
For every box in the dissection one places a fixed number, dependent on the desired 
accuracy e, of equidistant portals on the boundary of the box. The optimal TSP tour 
can cross between adjacent rectangles any number of times; a portal-respecting tour is 
allowed to cross only at portals. Flow bad can the cost of a defiected, portal-respecting, 
tour be compared to the optimum? Implicitly, Arora used a charging argument on the 
edges in an optimal solution to show that the edges could be made to be portal respecting. 
We sketch now his approach which was made explicit and applied to /c -median in [3]. 

The original solution is assumed to consist of a set of edges and is assumed to be 
surrounded by a rectangle with sidelength polynomial in n (cf. Section 2). At level i 
of the dissection, the rectangles at this level with sidelength 2* are cut by vertical and 
horizontal lines into rectangles of sidelength 2*^^. The x- and y-coordinates of the 
dissection are randomly shifted at the beginning, so that the probability that an edge s in 
a solution is cut at level i is 0(length(s) /2*) . Let m denote the number of portals along 
the dissection lines. If s is cut at level i it must be defiected through a portal, paying 
additional cost 0(V jm). Summing over all the O(logn) levels of the decomposition, 
the expected defiection cost of edge s in a portal-respecting solution is at most 

O(logn) 

E 0(!2!25<i>(27m)) (1) 

i=l 

Selecting m = 0(logn/e) demonstrates the existence of a portal-respecting solution 
of cost (1 + e)OPT. The running time of the dynamic programming contains a 
term, hence the term in the overall running time. 

Arora additionally used a “patching lemma” argument to show that the TSP could be 
made to cross each box boundary 0(1 /e) times. This yielded an 0(n(log ) time 

algorithm for TSP. (This running time was subsequently improved by Rao and Smith to 
+ n log n) [17] while still using 6>(log n/e) portals). The /c-median method 
did not, however, succumb to a patching lemma argument, thus the running time for the 
algorithms in [3] remained 

Our method reduces the number of portals m to O (1/e) . That is, we remove the log n 
factor in the number of portals that appear to be inherent in Arora, Raghavan, and Rao’s 
charging based methods and even in Arora’s charging plus patching based methods. 
Adaptive dissection. We outline some of the ideas behind the reduced value for m. 
The computation in (1) exploits linearity of expectation by showing that the “average” 
dissection cut is good enough. The complicated dependencies between the dissection 
lines across all 0(log n) levels seem prohibitive to reason about directly. On the other 
hand, when summing the expectations across all levels an O(logn) factor creeps in. 
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which apparently has to be offset by setting m to log nfe. We provide a new structure 
theorem to characterize the structure of near-optimal solutions. In contrast to previous 
approaches, given a rectangle at some level in the decomposition, it seems a good idea to 
choose more than one possible “cuts” hoping that one of them will hit a small number of 
segments from the optimum solution. This approach gives rise to the adaptive dissection 
idea, in which the algorithm “guesses” the structure of the part of the solution contained 
in a given rectangle and tunes accordingly the generation of the subrectangles created 
for the next level of the dissection. In the A: -median problem the guess consists of the 
area of the rectangle which is empty of facilities. Let L be the maximum sideline of 
the subrectangle containing facilities. Cutting close to the middle of this subrectangle 
with a line of length L should, in a probabilistic sense, mostly dissect segments from the 
optimal solution of length J?(L), forcing them to deflect by Ljm = eL. A number of 
complications arise by the fact that a segment may be cut by both horizontal and vertical 
dissection lines. We note that the cost of “guessing” the empty area is incorporated into 
the size of the dynamic programming lookup table by trying all possible configurations. 
Given the preeminence of recursive dissection in approximation schemes for Euclidean 
problems ([1,2,3,17]) we believe that the adaptive dissection technique is of independent 
interest and can prove useful in other geometric problems as well. 

Although the adaptive dissection technique succeeds in reducing the required number 
of portals to 0{l/e) and thus drastically improve the dependence of the running time 
on 1/e, the dynamic program has still to enumerate all possible rectangles. Compared 
to the algorithm in [3] we apparently have to enumerate even more rectangles due to the 
“guess” for the areas without facilities. We further reduce the size of the lookup table 
by showing that the boundaries of the possible rectangles can be appropriately spaced 
and still capture the structure of a near-optimal solution. 

2 Preliminaries 

An edge (u, v) is a line segment connecting input points u and v. In the context of a 
solution, i.e. a selection of k medians, an assignment edge is an edge (u, v) such that 
one of the points is the median closest to the other in the solution. Unless otherwise 
specified, the sidelength of a rectangle with sides parallel to the axes is the length of its 
largest side. 

We assume that the input points are on a unit grid of size polynomial in the number 
of input points. This assumption is enforced by a preprocessing phase described in [3] 
incurring an additive error of 0(l/n‘^) times the optimal value, for some constant c > 0. 
The preprocessing phase can be implemented by a plane sweep in conjunction with the 
minmax diameter algorithm of Gonzalez [11]. The latter can be implemented to run in 
0{n log k) time [10]. The total preprocessing time is 0{n log n) . 

3 The Structure Theorem 

In this section we prove our basic Structure Theorem that shows the existence of ap- 
proximately optimal solutions with a simple structure. Our exposition focuses on 2- 
dimensional Euclidean instances. It is easy to generalize to d dimensions. Given a set 
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of points S and a set of facilities F C S', we define the greedy cost of the solution to be 
the cost of assigning each point to its closest facility. We proceed to define a recursive 
randomized decomposition. The decomposition has two processes; the Sub-Rectangle 
process, and the Cut-Rectangle process. 

Sub-Rectangle: 

Input: a rectangle B containing at least one facility. 

Process: Find the minimal rectangle B' containing all the facilities. Let its maximum 
sidelength be S. Grow the rectangle by S/3 in each dimension. We call the grown rec- 
tangle B” . 

Output: Bs = B" n B. 

Notice that B — Bs contains no facilities. 

Cut-Rectangle: 

Input: a rectangle B containing at least one facility. 

Process: Randomly cut the rectangle into two rectangles with a line that is orthogonal 
to the maximal side in the middle third of the rectangle. 

Output: The two rectangles. 

The recursive method applies alternatingly the Sub-Rectangle and 
Cut-Rectangle processes to produce a decomposition of the original rectangle contai- 
ning the input. We remark that the original rectangle is not necessarily covered by leaf 
rectangles in the decomposition, due to the sub-rectangle steps. 

We place m + 1 evenly spaced points on each side of each rectangle in the de- 
composition, where m will be defined later and depends on the accuracy of the sought 
approximation. We call these points portals. We define a portal-respecting path between 
the two points to be a path between the two points that only crosses rectangles that enc- 
lose one of the points at portals. We define the portal-respecting distance between two 
points to be the length of the shortest portal respecting path between the points. We begin 
by giving three technical lemmata which will be of use in the main Structure Theorem. 

Lemma 1. If the maximal reetangle R separating points v and w has sidelength D the 
difference between the portal respecting distance and the geometric distance between v 
and w is 0{D/m). 

We define a eutting line segment in the decomposition to be (i) either a line segment I 
that is used in the Cut-Rectangle procedure to divide a rectangle R into two rectangles or 
(ii) a line segment I used to form the boundary of a sub-rectangle R in the Sub-Rectangle 
procedure. In both cases we say that I euts R. We define the sidelength of a cutting line 
I as the sidelength of the rectangle cut by 1. Observe that the length of a cutting line is 
upperbounded by its sidelength. 

Lemma 2. If any two parallel cutting line segments produced by the application of 
Cut-Rectangle are within distance L, one of the line segments has sidelength at most 
3L. 

Proof. Let l\, I 2 be the two cutting segments at distance L. Assume wlog that they 
are both vertical, R is the longer of the two lines, h is on the left of I 2 and cuts a 
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rectangle R of sidelength greater than 3L into R\ and i? 2 - Then li is produced first in 
the decomposition. Thus ^2 is contained within i ?2 (seeFig. la), and since it comes second 
can only cut a rectangle R '2 contained within i ?2 • By the definition of Cut-Rectangle, 
if s is the sidelength of R' 2, 12 is drawn at least s/3 away from the left boundary of R '2 
which implies s/3 < L. Thus s < 3L. □ 

The next lemma relates the length of a cutting line segment produced by Sub-Rectangle 
to the length of assignment edges it intersects. 

Lemma 3. If a cutting line segment X produced by Sub-Rectangle interseets an assig- 
nment edge (v, /) of length D, X has sidelength at most 5D. 

Proof. Let R be the rectangle cut by X. Observe that / must be contained in R. Let s 
be the sidelength of the rectangle R. Wlog assume that the horizontal dimension of R 
is maximal. By the definition of Sub-Rectangle there is y < s such that s = (5/3)y and 
the two vertical strips of width y/3 at the sides of R are empty of facilities. Therefore 
y/3 < D which implies that s < 50. □ 





(a) 



(b) 



Fig. 1. (a) Demonstration of Lemma 2. (b) Case A2 in the proof of Theorem 4. 



We assign each point to the closest facility under the portal respecting distance 
function. The modified cost of the assignment is the sum over all the points of the 
portal-respecting distances to their respective assigned facility. 

Theorem 4. /Structure Theorem) The expected difference between the modified cost 
and the greedy east, C, of the facility location problem is 0{C/m). 

Proof of theorem: By linearity of expectation, it suffices to bound the expected cost 
increase for a given assignment edge. For a point v we define / as u’s closest facility 
and assume / to be to the right of v (without loss of generality.) We define I as the 
closest facility to the left of v. We denote the distance from u to / by D, and the distance 
from u to i by L. The idea behind the analysis of the portal-respecting solution is that 
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assigning v to either / or I (based on the amount by which the decomposition distorts 
each distance) will be enough to show near-optimal modified cost. 

We assume without loss of generality that v and / are first separated by a vertical 
cutting line. (We turn the configuration on its side and do the same argument if this 
condition does not hold.) 

The semicircle of diameter 2L centered at v and lying entirely to the left of the 
vertical line containing v is empty in its interior. Therefore we obtain: 

Lemma 5. In the decomposition, v and f are not separated for the first time by a cutting 
line of sidelength in the interval {8D, L/2). 

Proof of lemma: We know that v and I are separated by the time the sidelength of any 
enclosing rectangle is at most L. Observe that by Lemma 3, a cutting line of sidelength 
> 8D separating v and / can only be produced by Cut-Rectangle. For any rectangle of 
sidelength L/2 containing v, there are no facilities to the left of v. Thus, by the sub- 
rectangle process, for boxes of sidelength s < L/2, the left boundary of any rectangle 
containing v of sidelength s < L/2 is within distance s/5 of u. By the Cut-rectangle 
process, any cutting line for a rectangle box is at least s/3 to the right of its left boundary 
which implies that the cutting line is at least 2s/ 15 to the right of v. Thus, the length D 
line segment cannot be cut until the box size is at most (15/2)L>. □ 

We proceed to a case analysis based on which of the two edges (u, 1) or (u, /) is 
separated first by the decomposition. Observe that (v, 1) can be separated for the first 
time by either a vertical or a horizontal line. 

CASE A. Edge {v, 1) is separated first. If v and / are first separated by a line produced 
by Sub-Rectangle, the increase in cost is 0(l/m)5L> by Lemma 3. Therefore we can 
assume in the examination of CASE A that v and / are first separated by a vertical cutting 
line produced by Cut-Rectangle. The following calculation is straightforward and will 
be of use. Let E{A,Z) denote the event that an edge of length A is separated by a cutting 
line produced by Cut-Rectangle of sidelength Z. Then Pr[E{A, Z)j < SAjZ. 

We will now calculate the expectation of the cost increase for the two possible sub- 
cases. Let /i, ^ denote the lines separating for the first time (v, 1) and (v, /) respectively. 
CASE Al. Edge {v, 1) is separated for the first time by a vertical cutting line. With some 
probability p, ^ has sidelength L/2 or more. By Lemma 5, ^ has sidelength at most 
8L> with probability (1 — p). Therefore, by Lemma 1, the cost increase is 0{D/m) 
with probability (1 — p). Now we turn to the case in which ^ has length L/2 or more. 
If /i is produced by Sub-Rectangle, by Lemma 3, p has sidelength at most 5L. If p is 
produced by Cut-Rectangle, by Lemma 2 either or p has length at most 3L. By Lemma 
1 the cost increase is 0{L/m) regardless of the operation producing p. Probability 
p is at most c(2L>/L)(l + 1/2 + 1/4 + . . .) = 0{D/L) for appropriate constant 
c. Therefore the expected cost increase is at most (1 — p)0{D/m) + pO{L/m) = 
0{Dlm) + 0{{D/L){L/m)) = 0{D/m). 

Remark: The probability calculation for Case Al depended only on the choice of which 
vertical line first cuts (u, /) . This will be useful in Section 4 when we restrict our choices 
for vertical cutting lines. 

CASE A2. Edge (v, 1) is separated for the first time by a horizontal cutting line. We 
compute first the expectation of the cost increase conditioned upon the sidelength X of 
line p. See Figure lb. Edge (u, /) is cut by a line of sidelength Y with probability at most 
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“iD jY. Observe that this is true regardless of the value of X. Moreover LJ2 <Y < X 
or by Lemma 5, F < 80. The upper bound of X holds since (v, /) is contained in 
the rectangle of sidelength X that contains v and / and 1. A consequence of the latter 
fact is that /r cannot have been produced by a Sub-Rectangle operation. The conditio- 
nal expected cost increase is bounded by '^L/ 2 <Y<x\ 3 i Ye[ 2 '-^ 2 q(-^/^)(^/^) ~ 
0{{D/m) \og{X/L)). We now remove the conditioning on X. Line /x is produced by 
Cut-Rectangle thus it has sidelength X with probability at most ‘iL! X. The expectation 
of the cost increase is = 0{D/m). 

CASE B. Edge {v, f) is separated first. The analysis is similar to Case A above with two 
Cases B 1 , B2 based on whether (v, 1) is separated by a vertical or a horizontal line. □ 



4 Modifying the Structure Theorem 

The Structure Theorem in the previous section demonstrates that a 
portal-respecting (1 + e)-optimal solution exists while only placing 0(l/e) portals on 
the boundary of the decomposition rectangles. Using ideas from [3], Theorem 4 by itself 
would suffice to construct a dynamic programming algorithm running in 0(2^^^kn^) 
time. In this section we show how to effectively bound the number of rectangles to be 
enumerated by the dynamic program and thus obtain a nearly linear-time algorithm. 

We give first some definitions. Consider the rectangle of sidelength N that surrounds 
the original input. We assume that N = m2P for some integer p. Then, we call the vertical 
(horizontal) lines that are numbered 0 mod 2% 1 < i < p, starting from the left (top) 
i-allowable. Note that all the lines are 1-allowable, the top and leftmost lines are p- 
allowable, and that any j -allowable line is i-allowable for all i < j. A rectangle R of 
sidelength s is allowable if the boundaries lie on t-allowable lines where t is the maximal 
value such that 2* < s/m. 

We modify the Sub-rectangle and Cut-rectangle processes as follows. 

Sub-rectangle: 

Input: An allowable rectangle containing at least one facility. 

Process: Perform the sub-rectangle process of the previous section. 

Output: the minimal allowable rectangle that contains the rectangle computed in the 
Process. 

Cut-rectangle: 

Input: An allowable rectangle containing at least one facility. 

Process: Choose a cutting line in the middle third of the rectangle uniformly among all 
lines that produce two allowable subrectangles. 

Output: The two allowable subrectangles. 

Notice that the decomposition is essentially the same as before. The primary diffe- 
rence is that the randomization in the cut-rectangle process is diminished. If we add back 
some randomization up front by shifting the original rectangle surrounding the input, 
we can get the same result as in the Structure Theorem on the expected increased cost 
of a portal respecting solution. 
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Theorem 6. If the original rectangle is randomly shifted by (a, b) where a and b are 
chosen independently and so that each line is equally likely to be l-allowable, then the 
expected difference between the modified cost and the greedy cost, C, of the k-median 
problem is 0(C jm). 

Proof sketch: All the deterministic lemmata from Section 3 continue to hold in the 
restricted version of the decomposition. The randomized portion in the proof of Theorem 
4 only reasons about two types of events. Moreover, it reasons about each event in 
isolation (cf. Cases A1 and A2 in the proof). Thus, we need only be concerned with 
the probabilities of each event. The random shift up front along with the randomization 
inside the process will ensure that these two types of events occur in our allowable 
decomposition procedure with approximately the same probability as in the previous 
decomposition. 

The first event (cf. Case Al) is that a cut-rectangle line of length X cuts a line segment 
of length D. The probability of this event is required to be at most “iD jX in the proof of 
the Structure Theorem. We show that this continues to hold albeit the constant is larger 
than 3. 

We assume for simplicity, that the line segment is in a rectangle R of size exactly 
X = m2* at some point. (This will be true to within a constant factor.) If D < 2*, we 
know that the segment is cut by at most one f-allowable line. The probability of this 
event is at most D j’P due to the random shift. The cut-rectangle chooses from m/3 
f-allowable lines uniformly at random. Thus, the line segment is cut with probability 
l/(m/3) times I?/2*, which is “iD jX as required. If I? > 2*, we notice that D intersects 
at most [D/2*] < 2D/2* lines. By the union bound the probability that this line segment 
is cut during Cut-Rectangle on R is uppebounded by 2D/2* times 3/m, i.e., ^DjX. 

The second event (cf. Case A2) is the intersection of two events; a horizontal line 
of length X cuts a segment of length L and a vertical line of length Y cuts a segment 
of length D. In the proof of the Structure Theorem, the probability was assumed to be 
upper bounded by the product of the probability bounds of the two events, i.e., 3L/X 
times ZDjY . 

For our restricted decomposition, the probability of each event in isolation can be 
bounded by ^LjX and ^DjY as argued above. Moreover, we chose the horizontal 
and vertical shifts independently and we choose the horizontal and vertical cut-lines in 
different sub-rectangle processes. Thus, we can also argue that the probability of the two 
events is at most 6L/A times 6D/Yi □ 

We now prove a lemma bounding the number of allowable rectangles. 

Lemma 7. The number of allowable rectangles that contain I or more points is 
0{rnf{n/l) logn). 

Proof. Our proof uses a charging argument. Let Ri be a rectangle on the plane that has 
minimum sidelength, say L and contains I points. We bound the cardinality of the set Si 
of allowable rectangles, which are distinct from Ri, contain at least I points and have at 
least one point in common with Ri. Let Ra be such a rectangle. Then Ra has sidelength 
at least L, otherwise it would have been chosen instead of Ri. 

We bound the number of allowable rectangles in Si with sidelength X € [2*^^ , 2*]m 
by O(m^) as follows. The comers must fall on the intersection of two i-allowable lines 
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that are within distance X of Ri. The number of i-allowable lines that are within distance 
X of Ri is 0(X/2*-i) since X > L. Thus, the number of comer choices is 0{w?). Two 
comers must be chosen, so the number of rectangles in Si of sidelength X € [2*^^, 2*]m 
is O(m^). Since there are O(logn) values of i, |6'i| = 0(m^ logn). 

Now, we remove Ri and its points from the decomposition, and repeat the argument 
on the remaining n — i points. The number of repetitions until no points are left is O (n/ Z ) 
therefore by induction, we get a bound of 0{m^{n/l) log n) on the number of allowable 
rectangles that contain at least I points. □ 

5 The Dynamic Program 

We have stmctural theorems relative to a particular decomposition. Unfortunately, the 
decompositions are defined with respect to the facility locations. However, in reality 
they only use the facility locations in the Sub-rectangle steps. Moreover, the number 
of subrectangles is at most polynomial in the size of the original rectangle. Indeed, the 
number of allowable subrectangles is polynomial in m and n. Thus, we can perform 
dynamic programming to find the optimal solution. The structure of the lookup table is 
similar to the one used in [3]. We exploit our Stmcture Theorem and the analysis on the 
total number of allowable rectangles to obtain a smaller number of entries. 

The table will consist of a set of entries for each allowable rectangle that contains at 
least one point. For each allowable rectangle, we will also enumerate the following 

- the number of facilities in the rectangle, 

- a distance for each portal to the closest facility in the rectangle and 

- a distance for each portal to the closest facility outside the rectangle. 

Actually, we will only approximate the distances to the nearest facility inside and 
out of the rectangle to a precision of s/m for a rectangle of sidelength s. Moreover, we 
do not consider distances of more than 10s. Finally, the distance value at a portal only 
changes by a constant from the distance value at an adjacent portal. This, will allow us 
to bound the total number of table entries by for each allowable rectangle. See 

[3] for further details on the table construction. 

We can compute the entries for a rectangle of sidelength s, by looking at either all 
the subrectangles from the Sub-rectangle process or by looking at all ways of cutting the 
rectangle into allowable smaller rectangles. This is bounded by 0(2*^*^™) ) time per table 
entry. We bound the table size by noting that the total number of allowable rectangles 
is, by Lemma 7, at most 0(m^n log n). Thus, we can bound the total number of entries 
in the table by 0(A:2‘^(™)nlogn). 

We can improve this to log k log n) as follows. If an allowable rectangle 

contains fewer than I < k nodes, we only need to keep entries for it, since at 

most I facilities can be placed inside it. Moreover, the number of allowable rectangles 
containing between I and 21 points is shown by Lemma 7 to be 0{m'^{n/l) log n). We 
can now bound the total number of entries by 

Kk 

k2^^^^0{m^{n/k)\ogn) + ^ 2^(™)O(m^(n/0 log n); = 0(2^(™)nlog/clogn). 

1 = 2 ' 

We are now ready to state the main result of the paper. 
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Theorem 8. Given an instance of the k-median problem in the 2-dimensional Euclidean 
space, and any fixed m > Q, there is a randomized algorithm that computes a (1 + 1/m)- 
approximation, in expectation, with worst case running time 0(2‘^(™)nlogA:logn). 

Repeating the algorithm O(logn) times gives a (1 + 0(l/m)) approximation gua- 
rantee with probability 1 — o(l). The algorithm can easily be extended to instances in 
the d-dimensional Euclidean space. We omit the details. 

Theorem 9. Given an instance of the k-median problem in the d-dimensional Euclidean 
space, and any fixed m > Q, there is a randomized algorithm that computes a (1 + 1/m)- 
approximation, in expectation, with worst case running time 
0(2'^(™‘^)n log k log n) . 

For the uncapacitated facility location problem we do not need to keep track of the 
number of facilities open for subrentagles, hence we obtain an approximation scheme 
with running time 0(2^^^ log n) . We omit the details. 
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Abstract. Scheduling dependent jobs on multiple machines is modeled by the 
graph multi-coloring problem. In this paper we consider the problem of minimizing 
the average completion time of all jobs. This is formalized as the sum multi- 
coloring (SMC) problem: Given a graph and the number of colors required by 
each vertex, find a multi-coloring which minimizes the sum of the largest colors 
assigned to the vertices. It reduces to the known sum coloring (SC) problem in the 
special case of unit execution times. 

This paper reports a comprehensive study of the SMC problem, treating three 
models: with and without preemption allowed, as well as co-scheduling where 
tasks cannot start while others are running. We establish a linear relation bet- 
ween the approximability of the maximum independent set (IS) and SMC in all 
three models, via a link to the SC problem. Thus, for classes of graphs where 
IS is p-approximable, we obtain 0(/9) -approximations for preemptive and co- 
scheduling SMC, and 0{p ■ log n) for non-preemptive SMC. In addition, we give 
constant-approximation algorithms for SMC under different models, on a number 
of fundamental classes of graphs, including bipartite, line, bounded degree, and 
planar graphs. 



1 Introduction 

Any multi-processor system has certain resources, which can be made available to one 
job at a time. A fundamental problem in distributed eomputing is to efficiently sehedule 
jobs that are competing on such resources. The scheduler has to satisfy the following two 
conditions: ()) mutual exclusion', no two conflicting jobs are executed simultaneously. 
(m) no starvation', the request of any job to run is eventually granted. The problem is 
well-known in its abstracted form as the dining/ drinking philosophers problem (see, 
e.g., [D68,L81]). 

Scheduling dependent jobs on multiple machines is modeled as a graph coloring 
problem, when all jobs have the same (unit) execution times, and as graph multi-coloring 
for arbitrary execution times. The vertices of the graph represent the jobs and an edge in 
the graph between two vertices represents a dependency between the two corresponding 
jobs, that forbids scheduling these jobs at the same time. More formally, for a weighted 
undirected graph G = (V, E) with n vertices, let the length of a vertex u be a positive 
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integer denoted by x(v), also called the co/or requirement of v. A multi-coloring of the 
vertices of G is a mapping into the power set of the positive integers, ]!/■. V ^ 2^ . 
Each vertex v is assigned a set of x{v) distinct numbers (colors), and adjacent vertices 
are assigned disjoint sets of colors. 

The traditional optimization goal is to minimize the total number of colors assigned 
to G. In the setting of a job system, this is equivalent to finding a schedule, in which 
the time until all the jobs complete running is minimized. Another important goal is 
to minimize the average completion time of the jobs, or equivalently, to minimize the 
sum of the completion times. In the sum multi-coloring (SMC) problem, we look for a 
multi-coloring iT that minimizes where the completion time is the 

maximum color assigned to v by W. We study the sum multi-coloring problem in three 
models: 

- In the Preemption model (p-SMC), each vertex may get any set of colors. 

- In the No-Preemption (np-SMC) model, the set of colors assigned to each vertex 
has to be contiguous. 

- In the Co-ScHEDULiNG model (co-SMC), the vertices are colored in rounds: in each 

round the scheduler completely colors an independent set in the graph. 

The Preemption model corresponds to the scheduling approach commonly used 

in modem operating systems [SG98]: jobs may be intermpted during their execution 
and resumed at later time. The No-Preemption model captures the execution model 
adopted in real-time systems, where scheduled jobs must run to completion. The Co- 
Scheduling approach is used in some distributed operating systems [T95]. In such 
systems, the scheduler identifies subsets of cooperating processes, that can benefit from 
running at the same time interval (e.g., since the processes in the set communicate 
frequently with each other); then, each subset is executed simultaneously on several 
processors, until all the processes in the subset complete. 

The SMC problem has many other applications, including traffic intersection control, 
session scheduling in local-area networks (see, e.g., in [1L97]), compiler design and VLSI 
routing [NSS94]. 

Related Work 

When all the color requirements are equal to 1, the problem, in all three models, 
reduces to the previously studied sum coloring (SC) problem (A detailed survey ap- 
pears in [BBH+98]). A good sum coloring would tend to color many vertices early; the 
following natural heuristic was proposed in [BBH+98]. 

MAXIS: Choose a maximum independent set in the graph, color all of its vertices with 

the next available color; iterate until all vertices are colored. 

This procedure was shown to yield a 4-approximation, and additionally, in the case that 
IS can only be approximated within a factor of p, it gives a 4/O-approximation. 

On the other hand, SC also shares the inapproximability of IS, which immediately 
carries over to SMC: SC cannot be approximated in general within for any e > 0, 
unless NP = ZPP [FK96,BBH+98]. It is also NP-hard to approximate within some 
factor c > 1 on bipartite graphs [BK98]. 

Our Results 

This paper reports a comprehensive study of the SMC problem. We detail below our 
main results, and summarize further results, that will appear in the full version of this 
paper. 
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General Graphs: Our central results are in establishing a linear relation between the 
approximability of the maximum independent set (IS) and SMC in all three models, via 
a link to the SC problem. For classes of graphs where IS is solvable, we obtain a 16- 
approximation to both pSMC and coSMC, as well as 0(log min(n, p)) -approximation 
to npSMC. For classes of graphs where IS is p-approximable, these ratios translate to 
0(/o) -approximations for pSMC and coSMC, and 0{p ■ log n) for npSMC. 

Important Special Classes of Graphs: We also study special classes of graphs. For 
pSMC, we describe a 1.5-approximation algorithm for bipartite graphs. We generalize 
this to a (/c -f l)/2— ratio approximation algorithm for /c— colorable graphs, when the 
coloring is given. Also, we present a (A + 2)/3 approximation algorithm for graphs of 
maximum degree A, and a 2-approximation for line graphs. 

For npSMC, we describe a 2.796-approximation algorithm for bipartite graphs, and 
a 1.55A: + 1 -approximation algorithm when a /c-coloring of the graph is given. These 
bounds are absolute, i.e. independent of the cost of the optimal solution, and also have 
the advantage of using few colors. 

Comparison of Mnlti-coloring Models: We explore the relationship among the three 
models and give a construction, which indicates why the np-SMC model is “harder". 
Namely, while finding independent sets iteratively suffices to approximate both the p- 
SMC and the co-SMC problems, any such solution must be J7 (log p) off for the np-SMC 
problem. 

Performance Bounds for the MAXIS Algorithm: An immediate application of the 
MAXIS algorithm for the SC problem appears in the 0(1)— approximation algorithm 
for the p-SMC problem. Furthermore, most of our algorithms reduce to MAXIS for 
the SC problem, if the color requirements are uniform. It is therefore natural to find 
the exact performance of MAXIS for the SC problem. In [BBH+98] it was shown that 
this algorithm yields a 4-approximation for the sum coloring problem. We give here a 
construction which shows, that MAXIS cannot achieve an approximation factor better 
than 4. 

As for the SMC problem, we note that it is usually preferable to color many vertices 
early, even if it means that more colors will be needed in the end. Thus, the MAXIS 
algorithm is a natural candidate heuristic also for the SMC problems. However, we 
show that MAXIS is only an approximation algorithm for p-SMC, where p is 

the largest color requirement in the graph. For the np-SMC problem, its performance 
cannot even be bounded in terms of p. 

Further Results: Our results for the SMC problem can be extended to apply also to (i) 
the weighted SMC problem, where each vertex v is associated with a weight w{v) and 
the goal is to minimize Yhvev ' f'l'iv), and (ii) on-line scheduling of dependent 
jobs. A summary of our results for these problems will be given in the full version of the 
paper. 

Organization of the Extended Abstract: 

Due to space limitations, we include in this abstraet only the main results of our study. 
Detailed proofs are in the full version of the paper [BHK+98]. In Section 2 we introduce 
some notation and definitions. Section 3 presents approximation algorithms for the p- 
SMC problem. Section 4 describes the results for the np-SMC problem, and Section 5 




Sum Multi-coloring of Graphs 393 



discusses the co-SMC problem. Finally, in Section 6 we briefly present our lower bound 
of 4, for the MAXIS algorithm. 

2 Definitions and Notation 

For a given undirected graph G = (V, E) with n vertiees, and the mapping x : V ^ N, 
we denote by S{G) = x{v) the sum of the color requirements of the vertices in G. 
We denote by p the maximum color requirement in G, that is p = max^,gy x(v). An 
independent set in G is a subset I of V such that any two vertices in / are non-adjacent. 

Given a multi-coloring 'E of G, denote by Gi the independent set that consists of vertices 
withi e E{v),hy cf (v), . . . , the colleetionof x{v) colors assigned to v, and by 

f^{v) = (v) the largest color assigned to v. The multi-color sum of G with respect 

to E is SMC(G, E) = ■ 

A multi-coloring E is contiguous (non-preemptive), if for any v , the eolors assigned 

to V satisfy cf|_i(u) = cf (u) + 1 for 1 < i < x{v). In the eontext of scheduling, this 
means that all the jobs are proeessed without interruption. A multi-coloring E solves the 
co-scheduling problem, if the set of vertices can be partitioned into k disjoint independent 
sets V = Ii U ■■■ U Ik with the following two properties: (i) cf (v) = cf (v') for any 
v,v' e Ij, for 1 < j < k. (ii) < cf(u') for all v € Ij and v' € Jj+i for 

1 < j < fc. In the eontext of scheduling, this means scheduling to completion all the 
jobs corresponding to Ij, and only then starting to process the jobs in Jj+i, V f < j <k. 

The minimum multi-color sum of a graph G, denoted by pSMC(G), is the minimum 
SMC(G, E) over all multi-colorings E. We denote the minimum contiguous multi-color 
sum of G by npSMC(G). The minimum multi-color sum of G for the eo-scheduling 
problem is denoted by coSMC(G). Indeed, for any graph G\ 

S{G) < pSMC(G) < npSMC(G) < coSMC(G) . 

3 The Preemptive Sum Multi-coloring Problem 

In the preemptive version of the sum multi-coloring problem, a vertex may get any 
set of x{v) colors. Our first result is an 0(1 (-approximation for the p-SMC problem 
on general graphs. This is an extension of the result for the sum-coloring problem in 
the sense that it establishes a connection between sum multi-coloring and maximum 
weighted independent sets. Then we address several families of graphs: bipartite graphs, 
bounded degree graphs, and line graphs. 

General Graphs 

En route to approximating pSMC, we consider another measure of a multi-coloring, 
the sum of the average color value assigned to a vertex. We approximate it by reducing 
it to the sum coloring problem of a derived weighted graph. We then transform a multi- 
coloring with small sum of averages to one with small multi-color sum. 

In the weighted sum coloring problem, each vertex v has a weight tu(ii), and we need 
to assign to each vertex v a (single) color E{v), so as to minimize w{v)E{v). 

The weighted MAXIS algorithm, selecting an independent set with maximum weight in 
each round, gives a 4— approximation [BBFl+98]. 
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We denote by AV^{v) the average color assigned to v by 4', namely, AV^{v) = 
(E *)/^(^)' Let SA^(G) = AV,p{v) denote the sum of averages of 4^, and 
let SA* (G) be the minimum possible average sum. Clearly, S'A* (G) < pSMC(G). 

Given a multi-coloring instance {G, x), we construct a weighted graph {G', w) as 
follows. The graph has x(v) copies of each vertex v connected into a clique, with each 
copy of V adjacent to all copies of neighbors of v in G. The weight w(vi) of each copy 
Vi of V will be l/x{v). 

There is a one-one correspondence between multi-colorings 4x of (G, x) and colo- 
rings of G' , as the x{v) copies of a vertex u in G all receive different colors. Let 4x also 
refer to the corresponding coloring of G' . Observe that 

SCa.(G',u;)= V • cf (u) = V = 5Ao.(G, x). 

Lemma 1. Define the weight of a vertex to be l/x{v). Suppose that a multi-coloring 4/ 
of a graph G has the property that each color i is an independent set of weight within 
a p factor from optimal on the subgraph induced by yet-to-be fully colored vertiees. 
Then, SAtj^{G) < 4p ■ pSMC(G). Further, if 4/ is contiguous, then SMC(G,!f') < 
Ap ■ pSMC(G). 

Proof. A coloring 4x that satisfies the hypothesis, also implicitly satisfies the property 
of MAXIS on (G', w). Hence, by the result of [BBH+98], 

SA^,(G, x) = SC^{G\ w)<Ap- SC*{G\ w) = Ap- SA* (G, x) < Ap ■ pSMC(G). 

Note, that for any coloring, the final color of a vertex differs from the average color 
by at least exactly half the color requirement of the vertex, i.e., /i/^(u) > A\f/{v) + 
{x{v) — l)/2, with equality holding when 4/ is contiguous. Thus, SMC (G, 4/ ) = SA^^(G) 
+ pSMC(G) > SA*(G) -f We conclude that if 4x is contiguous, 

then 

SMC(G,if') = SA,i,{G) + '^^^'!^~^ < Ap-SA*{G) + '^^^'!^~^ < 4p-pSMC(G).D 

Theorem 2. pSMC can be approximated within a faetor of IQp. 

Proof. Given an instance (G, x), obtain a multi-coloring 4/ by applying weighted 
MAXIS on the derived graph G' . Then, form the multi-coloring 4x' that “doubles” each 
independent set of 4 / : 

4p'{v) = + 1 : i = cf{v),t < |~a;(u)/2] andi' = cf,{v),t' < [a;(u)/2j}. 

Observe that 4x' assigns each vertex v x{v) colors. 

Let rriy = Le the median color assigned to v by 4^. The largest color used 

by 4r' is = c^^(„)/ 2 ] + cf®(^.)/ 2 J ^ 2m„. Also < 2 • AV,^{v), since less than 
half the elements of a set of natural numbers can be larger than twice its average. Thus 
f^'{v) < A ■ A\fi,{v) and SMCjG,!?"') < 4 • SA^{G). Thus, by Lemma 1, ]P' is a 
16 / 9 — approximation of pSMC(G). □ 
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Bipartite Graphs, and A: -colorable Graphs 

Consider graphs that can be eolored with k colors, i.e., the set of vertiees V can be 
partitioned into k disjoint independent sets V = Ci U • • • U 14. Consider the following 
Round-Robin algorithm: For 1 < i < A: and At > 0, at round t = k ■ h + i give 
the color t to all the vertices of Vi that still need a color. It is not hard to see that 
f{v) < k ■ x(v) for all v (E V. Henee, the Round-Robin algorithm is a A:-approximation 
algorithm. In this subsection we give a non-trivial algorithm for A:— colorable graphs, 
with the eoloring given, whose approximation faetor is at most (k + l)/2. This gives 
a 2.5— ratio approximation for planar graphs, a 2— ratio approximation for outerplanar 
and series-parallel graphs, and a A:/2 + 1 approximation for graphs with tree-width 
bounded by A:. In partieular, for bipartite graphs the approximation ratio is bounded by 
1.5 — l/(2n). For simplicity, the result is described for bipartite graphs only. The result 
for general k is derived similarly. 

We need the following definitions and notations. Let G be a bipartite graph G(Vi, 
V 2 , E) with n vertiees, such that edges eonnect vertices in Vi with vertiees in V 2 . We 
denote by o;(G) the size of a maximum independent set in G. We use the term processing 
an independent set FF CV to mean assigning the next available color to all the vertices 
of W. Suppose that the first i colors assigned by a multi-eoloring 4^ are distributed 
among the vertices. The reduced graph of G is the graph for which the x{v) values are 
decreased accordingly with the colors assigned so far, deleting vertiees that were fully 
colored. Finally, let 7 (n) = (2n^)/ (3n — 1). 

Informally, the algorithm distinguishes the following two cases: If the size of the 
maximum independent set in the current reduced graph is “large,” the algorithm chooses 
to proeess a maximum independent set. Otherwise, if the maximum independent set is 
“small," the algorithm works in a fashion similar to Round- Robin. Once a vertex (or 
a colleetion of vertices) is (are) assigned their required number of eolors, the algorithm 
re-evaluates the situation. 

Algorithm 1 BC 

While someyertices remain do 

1. Let G be the current reduced graph, and let n be its number of vertices. 

2. If a{G) < 7 (n) do 

a) Let m be the minimum x(v) in G. Assume without loss of generality that 
Vi contains at least as many vertices v with x(v) = mas V 2 - 

b) Give the next m colors to the remaining vertices in V\, and the following 
m colors those in V 2 - 

3. else (a{G) > y{h)) 

a) Choose a maximum independent set I C V of size a{G). Let m be the 
minimum x{v) value in I. 

b) Give the next m colors to all the vertices in I. 

The algorithm runs in polynomial time, since finding a maximum independent set 
in a bipartite graph can be performed in polynomial time using fiow techniques (cf, 
[GJ79]) and since in each iteration at least one vertex is deleted. 

Theorem 3. BC approximates pSMC on bipartite graphs within a factor o/1.5. 
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Bounded Degree Graphs and Line Graphs 

A natural algorithm for the multi-coloring problem is the Greedy (First-Fit) algorithm. 
It proeesses the vertices in an arbitrary order, assigning to eaeh vertex v £ V the set of 
the smallest a; (u) colors, with which none of its preceding neighbors have been colored. 
This method has the advantage of being on-line, proeessing requests as they arrive. Let 
A denote the maximum degree of G. 

We ean show that Greedy is exactly A + 1 -approximate. Instead, we eonsider the 
modified version. Sorted Greedy (SG), whieh orders the vertiees in a non-decreasing 
order of color requirements, before applying Greedy. This slight change improves the 
approximation ratio by a factor of nearly 3. The proof is omitted. 

Theorem 4. SG provides a (Z\ + 2) / ‘i-approximation to pSMC(G), and that is tight. 

SG also has good approximation ratio for line graphs and intersection graphs of 
A: -uniform hypergraphs. 

Theorem 5. SG provides a2 — A/ {A A) -approximation to pSMC(G) on line-graphs. 
More generally, it provides a k{l — 2{k — 1)/(Z\ + 2k)) approximation ratio on inters- 
ection graphs of k-uniform hyper-graphs. 



Proof. Given a line graph G, form a graph H that is a disjoint colleetion {Ci , C 2 , . . . , 
G\v(h)\} of the maximal cliques in G. Add a singleton clique for each vertex that appears 
only once. 

The minimum contiguous multi-coloring sum of H is given by ordering the vertices 
of each Gi in a non-decreasing order of color requirements, for 

pSMC(iT) = <5(iT) + min(a:(M), a;(u)). 

(u,v)eE{H) 

Observe that since eaeh vertex in G appears at most twice in H, S{H) = 2S (G), and any 
multi-coloring of G corresponds to a multi-eoloring of H of at most double the weight, 
pSMC(7T) < 2 • pSMC(G). Further, there is a one-one correspondence between the 
edges of G and H. Thus, we have 

pSMC(G) > <S(G) + ^ min(a;(M), a;(u)). 

(u,v)eE{G) 

Letting d = min(a;(u), a;(u)))/<S(G), we bound the performanee ratio by 

SMC(G,SG) ^ l + d/2 

pSMC(G) -■'^’~l + d/A’ 

Since f{d) is monotone increasing, and d < A, we have f{d) < f{A) = 2-A/{A + A). 
This matches the bound proved for sum coloring for regular edge graphs. □ 
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4 The Non-preemptive Sum Multi-coloring Problem 

In the non-preemptive version of the sum multi-coloring problem, the set of colors 
assigned to any vertex must be contiguous. This makes it harder to obtain algorithms 
with good approximation ratios. We give here an algorithm for general graphs with a 
logarithmic factor (times the ratio of the independent set algorithm used), as well as 
constant factor algorithms for bipartite and /c-colorable graphs. 

General Graphs 

Let r be the number of different color-requirements (lengths) in the graph. The following 
is a p • min{0(logp), 0(log n), r} -approximation algorithm for the npSMC problem. 

Algorithm 2 SameLengthlS 

Let Small {v \ x{v) < pjr?}. 

Color first the vertices of Small, arbitrarily but fully and non-preemptively. 
Let V' denote V \ Small. 

G' (V',E',x',w), where 
x'{v) for each v eV, 

w{v) l/x'{v), the weight of each v e V', and 
LI' £1 U {(u, v) : x'{u) 7 ^ 

Apply weighted MAXIS to G' , coloring the vertices fully in each step. 

Theorem 6. SameLengthlS approximates npSMC within a factor of 
min{8plogp, Ihplogn + 1.5}, assuming IS can be approximated within p. 

Proof. The color sum of the vertices in Small is at most p/n^(l + 2+ ... + n) < pj2. 
Also, they use at most p/n colors in total, and yield an added cost of at most p to 
the coloring of V . Thus, coloring Small contributes at most an additive 1.5 to the 
performance ratio, when p>rf. 

Observe that G' is a super-graph of G, that the algorithm produces a valid multi- 
coloring, and that it is contiguous since jobs are executed to completion. Since the 
weights of G' are upper bounds on the weights of G, the cost of the coloring on G' is 
an upper bound of its cost on G. 

Observe that each color class is an independent set of weight at least p factor from 
maximum among independent sets in the current remaining graph. Thus, by Lemma 1, 

SMC(G, SameLengthlS) < 4p • pSMC(G'). 

Any independent set of G is partitioned into at most 2 ; = log (minjp, n^}) indepen- 
dent sets in G' , one for each length class. In addition, the rounding up of the lengths at 
most doubles the optimum cost of the instance. Thus, pSMC(G^) < 2z ■ pSMC(G) . □ 

Also, the factor log p can be replaced by log q, where q is the ratio between the largest 
to the smallest color requirement. 

Notice that our analysis rates our algorithm (that in fact produces a co-schedule, as 
detailed in the next section) in terms of a stronger adversary, that finds the best possible 
preemptive schedule. 




398 



A. Bar-Noy et al. 



Bipartite Graphs and /c-colorable Graphs 

Let G be a graph whose /c-coloring Ci,C 2 , ■ ■ - Ck is given. Let a be a constant to be 
optimized, and let d = a^. Let Ci[x, y] denote the set of vertices in Ci of lengths in the 
interval [x, y]. Our algorithm is as follows: 

Steps(G, a) 

Let X be a random number uniformly chosen from [0, 1]. 

d ^ 

Y ^ d^. 

for f 0 to log^ p do 
for jX- 1 to /c do 

Aij i — d^ Y 

Color vertices of Cj[Aij/d,Aij] using the next [Aij\ colors 

Steps can be derandomized, by examining a set of evenly spaced candidates for the 
random number X. The additive error term will be inversely proportional to the number 
of schedules evaluated. This yields the following results. 

Theorem 1 . If G is bipartite, then we can approximate npSMC(G) within a 2.796 
factor. If a k-coloring of a graph G is given, then we can approximate npSMC(G) 
within a 1.55A: + 1 factor. 

We note that the ratios obtained are absolute in terms of S{G), instead of being 
relative to the optimal solution. We strongly suspect that these are the best possible 
absolute ratios. 

We illustrate our approach by an example. Given a bipartite graph G, we color the 
vertices into sets G\ and G 2 . Let Gi [a, b] denote the set of vertices in Gi whose length 
is between a and b, inclusive. The idea is as follows: 

Color all vertices in G\ [1, 1] with the first color, followed by G 2 [1, 2] with the 
next 2 colors, Gi [2, 4] with the next 4, G 2 [3, 8] with the next 8, etc. 

In general, color + 1,22*], followed by G2[2^^-^ + 1,2^^+^], for 

i = 0,l,.... 

For a given vertex v in say Gi, the worst case occurs when x{v) = 2^* + 1, for some i. 
Then,uisfinishedinstepa;(u)+^*^g(22f +221+1) = a;(u)+22*+2 — 1 = 5a;(u)— 5.The 
same can be argued for any m € C 2 . Thus, we have bounded the worst case completion 
time of any vertex by a factor of 5, giving a guarantee on the schedule completion to the 
system and to each vertex. 

5 The Co-scheduling Problem 

Recall the definition of the co-scheduling problem. In this version, the multi-coloring 
first chooses an independent set Ji. It assigns colors to all the vertices in Ji using the 
first x\ colors, where x\ = max„g/j x(v). Then, the multi-coloring cannot use the first 
Xi colors for other vertices and therefore uses colors xi + I, xi + 2 etc. to color the rest 
of the vertices. The goal is to minimize the sum of all 

We show that Algorithm SameLengthlS of the previous section approximates the 
co-SMC problem within a constant factor, comparing favorably with the logarithmic 
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factor for np-SMC. We then construct a graph H for which coSMC(iJ) = J?(logp) • 
npSMC(iJ), indicating that this discrepancy in the performance is inherent. 

General Graphs 



Theorem 8. SameLengthlS approximates coSMC within a factor of IQp, assuming 
IS can be approximated within a factor of p. 

Proof. Let 'P be the coloring produced by SameLengthlS on G' . As the algorithm 
colors in rounds, it produces a valid co-schedule. Each color class is an independent set 
of weight within p factor of maximum, among independent sets in the current remaining 
graph. Thus, by Lemma 1, SMC(G, P) <4,p ■ coSMC(G'). 

We now relate the optimal schedules of G' and G. Let X be the length of the longest 
job in a given round of an optimal schedule of G. Rounding the length to a power of 
two results in a length X' = 2 . Consider the schedule that breaks the round into a 
sequence of rounds of length 1,2,4,..., X' , for a total length of 2X' — 1 < 4X. This 
is a valid schedule of G' , where each vertex is delayed by at most a factor of 4. Hence, 
coSMC(G') < 4 • coSMC(G), and the theorem follows. □ 

A Construction Separating co-SMC and np-SMC 

We have given 0(1) -approximations to the p-SMC problem and to the coSMC problem, 
while only a min {O (log p ) , O (log n) }-approximation for the npSMC problem. It would 
be curious to know the precise relationships among these three models. We give a partial 
answer by constructing an instance H for which coSMC (iL) = J?(logp) • npSMC(iL). 

For i = 0, . . . , logp, j = 1, . . . , and k = 1, . . . , 2% the graph H 

has vertex set V = {vij^k} where the color requirements of Vij^k is 2% and edge set 
E = {{vij^k, Vi'jpk') '■% = %' and k f k'}. In words, the graph has p vertices of each 
color requirement f = 2*, arranged in completely connected independent sets of size 
pj I with vertices of different requirements non-adjacent. 

Consider the straightforward non-preemptive coloring where the different color 
requirements are processed independently and concurrently. Then, the makespan of 
the schedule is p, and since the graph has p\gp vertices, the multi-coloring sum is 
0{p^\ogp). 

On the other hand, any independent set contains at most 2* vertices from each color 
requirement group i. Hence, in any independent set of length f in a co-schedule, there 
are at most 2£ vertices. Thus, at most 2t vertices are completed by step t, for each 
t = 1,2,... ,p log p/2. In particular, at most half of the vertices are completed by step 
plogp/4. Thus, the coloring sum of the remaining vertices is J?(p^ log^p). Hence, a 
J?(logp) separation between these models. As n = p ■ logp, this result also implies a 
J? (log n) —separation. 

6 The MAXIS Heuristic 

We describe in this section a construction showing that the performance ratio of MAXIS 
for the sum coloring problem is exactly 4, up to low order terms. Given the central use of 
(weighted) sum coloring in our algorithms for multi-coloring general graphs, this yields 
a good complementary bound for our heuristics. 
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We represent a coloring of a graph Ghy k colors as a tuple of length k: (ci , . . . , ) . 

The size of the set Ci of the vertices colored by i is c^. By definition, for a given pattern 
P = (ci, . . . , Cfe) the sum coloring is SC{P) = Yli=i i ' Ci- 

Given a pattern P = (ci, . . . , Cfe), we describe a chopping procedure which con- 
structs a graph Gp with n = G vertices. In Gp, there are k independent sets 

Gi,. . . ,Ck that cover all the vertices of the graph, such that \Gi\ = Ci. We now place 
the vertices of the graph in a matrix of size Ci X k. The ith column contains Ci ones at 
the bottom and Ci — Ci zeros at the top. Each vertex is now associated with a one entry 
in the matrix. 

The chopping procedure first constructs an independent set 7i of size c\. It collects 
the vertices from the matrix line after line, from the top line to the bottom line. In each 
line, it collects the vertices from right to left. Each one entry that is collected becomes 
a zero entry. Then it adds edges from I\ to all the other vertices, as long as these edges 
do not connect two vertices from the same column. In a same manner, the procedure 
constructs 1 2 ■ The size of /2 is the number of ones in the first column after the first step. At 
the beginning of the ith step, the procedure has already constructed 7i , . . . , 1 , defined 

all the edges incident to these vertices, and replaced all the one entries associated with 
the vertices of the sets 7i , . . . , 7^_i by zeros. During the ith step, the chopping procedure 
constructs in a similar manner the independent set 7^, the size of which is the number 
of ones in the first column of the matrix at the beginning of the step. Again, each one 
entry that is collected becomes a zero entry. Then the procedure connects the vertices 
of li with the remaining vertices in the matrix, as long as these edges do not connect 
two vertices from the same row. The procedure terminates after h steps when the matrix 
contains only zeros, leaving the graph with another coloring of G, I\, , Ih, which 
corresponds to a possible coloring by MAXIS. 

To achieve the desired bound, we build a pattern as follows. We let the first two entries 
be equal, and after two steps of chopping they should be equal to the third entry. After 
two additional chopping steps we would like to have the first four entries to be equal, 
and so on. Small examples of such patterns are (4, 4, 1) for 7 = 3 and (36, 36, 9, 4) 
for 7 = 4. For 7 = 4, MAXIS produces the pattern (36, 18, 9, 6, 4, 3, 3, 2, 1, 1, 1, 1) 
and then the ratio is 240/151 > 1.589. More formally, for x > 1, consider the pattern 

LB = (x, X, , pHrjyi-, -§ 1 ^ ■ We choose x such that LB contains only integral 

numbers (e.g., a; = (7!)^). In the chopping procedure, once we arrive at a pattern of equal 
size, we just take these 7 + 1 columns as the next 7 + 1 entries. Thus, it yields the pattern 



A{LB) = 



’ 2> 4> 6’ 9’ 12> • • • ’ (fe-1) 



2 ) 



(fe-l)fe’ > • • • ’ 



where ^ appears 7 + 1 



times. 

It can be shown, that S'C( Li?) < (77^ + 2. 65)a; and S'C(A(Li?)) > (477^ — 0.65)a:, 
where — The approximation ratio of MAXIS is 4 — 0(1/ In 7). 
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Abstract. The aehromatic number problem is as follows: given a graph G = 
(V,E), find the greatest number of colors in a coloring of the vertices of G such 
that adjacent vertices get distinct colors and for every pair of colors some vertex 
of the first color and some vertex of the second color are adjacent. This problem is 
NP-complete even for trees. We present improved polynomial time approximation 
algorithms for the problem on graphs with large girth and for trees, and linear 
time approximation algorithms for trees with bounded maximum degree. We also 
improve the lower bound of Farber et al. for the achromatic number of trees with 
maximum degree bounded by three. 



1 Introduction 

The achromatic number of a graph is the maximum size k of & vertex coloring of the 
graph, where every pair of the k colors is assigned to some two adjacent vertices and 
adjacent vertices are colored with different colors. The achromatic number problem is 
to compute the achromatic number of a given graph. This concept was first introduced 
in 1967 by Harary et al. [8] in a context of graph homomorphism (see [7]). A related 
well known problem is the chromatic number problem, that is computing the minimum 
size of a vertex coloring of a graph, where adjacent vertices are colored with different 
colors. These two problems are different in nature. For example, removing edges from 
a graph cannot increase its chromatic number but it can increase the achromatic number 
[10]. 

Previous Literature. Yannakakis and Gavril [11] showed that the achromatic number 
problem is NP-complete. It is NP-complete also for bipartite graphs as proved by Farber 
et al. [7]. Furthermore, Bodlaender [1] proved that the problem remains NP-complete 
when restricted to connected graphs that are simultaneously cographs and interval graphs. 
Caimie and Edwards [3] show that the problem is NP-complete even for trees. (However, 
the chromatic number of a tree can be computed in polynomial time. For a tree that is 
not a single node, the chromatic number is always two.) 

* The author is supported by Deutsehe Forschungsgemeinsehaft (DFG) as a member of the 
Graduiertenkolleg Informatik, Universitat des Saarlandes, Saarbriicken. 

** Partially supported by Komitet Badah Naukowych, grant 8 T1 1C 032 15. 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 402^13, 1999. 
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Let I be a fixed positive integer. There are known exaet polynomial time algorithms 
for two very restricted classes of trees: for trees with not more than I leaves, and for 
trees with ( 2 ) edges and with at least (* 2 ^) +1 leaves [7]. Caimie and Edwards [4] have 
proved that the achromatic number problem for trees with constant maximum degree 
can also be solved in polynomial time. Some phases of their algorithm are based on 
enumeration, and the algorithm has running time where m is the number of 

edges of the tree. 

An a-approximation algorithm [9] for a maximization problem 77 is a polynomial 
time algorithm, that always computes a solution to the problem 77, whose value is at least 
a factor - of the optimum. We call a the approximation ratio. For a formal definition 
of asymptotic approximation ratio see [9], but it is clarified below what we mean by 
asymptotic. 

Chaudhary and Vishwanathan [5] presented a 7-approximation algorithm for the 
achromatic number problem on trees . They also give an O ( ^/n) -approximation algorithm 
for graphs with girth (i.e. length of the shortest cycle) at least six, where n is the number 
of vertices of the graph. 

Our Results. Our first result is a 2-approximation algorithm for trees, which improves 
the 7-approximation algorithm in [5]. Our algorithm is based on a different idea from 
that of [5]. The algorithm is presented in Section 5. 

Let d{n) be some (possibly increasing) function and T be a tree with n vertices. 
Assuming that the maximum degree of T is bounded by d{n), we developed an alterna- 
tive, to that of Section 5, combinatorial approach to the problem. This let us reduce the 
approximation ratio of 2 to 1.582. Additional result is a 1.155-approximation algorithm 
for binary trees, i.e. with maximum degree at most 3. The ratios 1.582 and 1.155 are 
proved to hold asymptotically as the achromatic number grows. For example, the first 
algorithm produces an achromatic coloring with at least — 0{d{n)) colors, 

where 7'(T) is the achromatic number of T. We show that the algorithms for bounded 
degree trees can be implemented in linear time in the unit cost RAM model. Although, 
our algorithms for bounded degree trees are approximate and the algorithm of [4] for 
constant degree trees is an exact one, our algorithms have linear running time (running 
time of the algorithm in [4] is J7(m^^®)) and they also work on trees with larger maxi- 
mum degree (e.g. log(n) or even (n — 1)^^^, where n is number of vertices of the tree). 
We also improve a result of Farber et al. [7] giving a better lower bound for the achro- 
matic number of trees with maximum degree bounded by three. These results appear in 
Section 4. 

Our next result presented in Section 6 is an -approximation algorithm for 

graphs with girth at least six, which improves the 0(n^/^)-approximation in [5]. This 
algorithm is a consequence of our 2-approximation algorithm for trees. 

Let 7^ = 7^(G) be the achromatic number of a graph G. Chaudhary and Vishwanathan 
[5] show an 0(!7(G)^/^)-approximation algorithm for graphs G with girth > 6, and 
also state a theorem that for graphs with girth > 7, there is an O(v^) -approximation 
algorithm for the problem. They also conjectured that for any graph G, there is a \/W{G)- 
approximation algorithm for the achromatic number problem. We prove their conjecture 
for graphs of girth > 6, showing an -approximation algorithm for such graphs. 

This result is also described in Section 6. 
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Our approximation algorithms are based on a tree partitioning technique. With this 
technique we prove some combinatorial results for trees, that can be of independent 
interest. 



2 Preliminaries 



In this paper we consider only undirected finite graphs. For a graph G, let E{G) and V (G) 
denote the set of edges and the set of vertices of G, respectively. Given vi,V2 € V (G), 
the distance between vi and V2 is the number of edges in the shortest path between vi and 
V2- Let 6i , 62 € E{G) . The distance between ei and 62 is the minimum of the distances 
between an end-vertex of ei and an end-vertex of 62. We say that ei, 62 are adjacent if 
the distance between e\ and 62 is 0 . Moreover, we say that a vertex v £ V (G) and an 
edge e € E{G) are adjacent if v is an end-vertex of e. 

A coloring of a graph G = (V, E) with k colors is a partition of the vertex set V 
into k disjoint sets called color classes, such that each color class is an independent set. 
A coloring is complete if for every pair G\ , C2 of different color classes there is an edge 
^ E such that V\ G Gi, V2 G G2 (Gi and G2 are adjacent). The achromatic 
number 'E{G) of the graph G is the greatest integer k, such that there exists a complete 
coloring of G with k color classes. A partial complete coloring of G is a coloring in 
which only some of the vertices have been colored but every two different color classes 
are adjacent. We also consider coloring as a mapping c : V — ^ W, where c(v) > 0 
denotes the color (color class) assigned to v. It is obvious that any partial complete 
coloring with k colors can be extended to a complete coloring of the entire graph with 
at least k colors. Thus, the number of colors of a partial complete coloring is a lower 
bound for the achromatic number, and we can restrict the attention to subgraphs in order 
to approximate the achromatic number of the whole graph. 

A star is a tree whose all edges are adjacent to a common vertex, called center of 
the star. The size of a star is the number of its edges. A path with trees is a path with 
trees hanging from some internal vertices of the path. These trees are called path trees 
and the path is called spine. A path with stars is a path with stars hanging from some 
internal vertices (being stars’ centers) of the path. 

For a given tree T we call a leaf edge an edge which is adjacent to a leaf of T. A 
system of paths with trees for T is a family {T\ , . . . , } of subtrees of T such that: 

(1) each subtree Ti is a path with trees, (2) any two subtrees Ti, Tj (i f j) are vertex 
disjoint, and ( 3 ) the family is maximal, i.e. if we add to the family any edge of the tree 
which does not belong to the family, then we violate condition ( 1 ) or ( 2 ). The edges of 
the tree which do not belong to any subtree Ti of the given system of paths and are not 
adjacent to any leaf, are called links. If a given system of paths with trees consists of 
paths with empty trees, then it is called system of paths. We say that there is a conflict 
via an edge (or link) if there are two vertices colored with the same color and joined by 
the edge (or link). We proceed with an easy fact. 

Lemma 1 . Let T = (V) E) be any tree and S be any system of paths for T. Then S 
contains at least \E\ — leaf edges o/T) + 1 edges. 
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3 Coloring Paths 

Here, we optimally color any path. This is a starting point for Section 4. 

Lemma 2. There is a linear time algorithm, that finds a complete coloring of any path 
P with 'T{P) colors. 

Proof. Let f{l) = (*) if I is odd, and f(l) = ( 2 ) + ^ if I is even. Let P consists 
of f{l) edges. We show how to color P with I colors. Let I be odd. We proceed by 
induction on 1. For I = 1 and I = 3 the appropriate colorings are respectively: (1) and 
(1) — (2) — (3) — (1). Note that the last color is 1. Now let us suppose we can color any 
path with /(/) edges using I colors, such that the last vertex is colored with 1 . Then, we 
extend this coloring appending to the end, the following sequence of colored vertices: 
(i)-a+i)-(2)-(;+2)-(3)-(;+i)-(4)-(;+2)-...-(;-i)-(;+2)-(0- 
(i + 1 ) — (^ + 2 ) — ( 1 ). A similar proof for an even I and a simple proof of optimality 
are omitted. To color optimally P, we first compute f{l) (I is equal to ]P{P)). It is 
straightforward to check that the above algorithm runs in linear time. □ 

The above lemma can be extended to any system S of paths in which each link is 
adjacent to the beginning of some path (this property will be used later). We first define 
a parfial order -< on pafhs of the system. Let us consider a path p of our system, such that 
there is a vertex M on p with links: {u,vi),... , {u,Vj). Let p{u,Vi) 7 ^ p denote the path 
of the system with vertex Vi , for i = 1, .. . , j . For any path p of the system and any vertex 
u on p, if j > 1 , then we require that the number of paths preceding p among the paths 
p(u, Ui), . . . ,p(u, Vj) is at most one, i.e. #{p(u, vf} : i = 1, . . . ,j,p{u, vf ~< p} < 1. 
This partial order on paths of the system can be extended to a linear order, which we call 
path order. Then we form one big path P by connecting subsequent paths by the path 
order. Now we can color P as in the proof of Lemma 2. However, this could cause some 
conflicts via links. Let s be the first position in P where such a conflict appears. We can 
remove it by replacing colors of all vertices of P starting from this position, with the 
colors of their right neighbors. We continue this procedure until there are no conflicts. 

The partial order can be found in linear time and extended to a linear order using the 
topological sort in time 0(|<S|) [3]. Computing P thus takes linear time as well. After 
storing P in an array, the new colors on P can be computed in linear time by pointers 
manipulation (the details are omitted). Hence, we have: 

Lemma 3. There is a linear time algorithm which finds an optimal complete coloring 
of any system of paths in which each link is adjacent to the beginning of some path. 

4 Coloring Bounded Degree Trees 

We first show how to find in a bounded degree tree a large system of paths with stars. 
A lemma below can be proved by an induction on the height of the tree. This also gives 
a simple algorithm to compute a system of paths: given a tree with root r, compute 
recursively the systems of paths for each subtree of r and output the union of these 
systems and one of the edges adjacent to r. 
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Lemma 4. In any tree T = (V,E) with maximum degree d, there exists a system S of 
paths with stars consisting of at least 2 5 ^(d-i) ' \^\ ^dges from E. Moreover each link 
is adjacent to the first vertex of some path. 

Let T be a tree with maximum degree d and let k = max {p : ( 2 ) < 2 5 (d-i) I -S' I }■ So 
^ — \j 2 5 (d-i ) \/2|-E|- Let h : N — ^ TZ+ be a function such that h(k) ^ 0(d) < k. 

We design an algorithm that finds a complete coloring of T with at least ■4^{T) — 
h{k) colors. The algorithm is a generalization of the algorithm from Lemma 2. 

First, we give a description of some ideas behind the algorithm. We compute in T 
a system S of paths with stars by Lemma 4. We have many paths with stars and some 
of them are connected by links. We will be taking dynamically consecutive paths with 
stars from S according to the path order defined in Section 3, and concatenating them, 
forming finally a big path P with stars. 

The subpath of the spine of P (together with its stars) used to add two new colors 
I + 1,1 + 2 (for an odd I - see the proof of Lemma 2) is called a segment {l + l,l + 2). The 
colors I + 1 and I + 2 are called segment colors, while the colors 1,2,... ,l are called non- 
segment colors for segment (I + 1,1 + 2). The vertices of the segment (I + 1,1 + 2) with 
segment (resp. non-segment) colors are called segment (resp. non-segment) vertices. 
During the coloring of segment {l + l,l + 2),we want to guarantee connections between 
the segment colors and non-segment ones. As in the proof of Lemma 2, all the segments 
begin and end with color 1. We assume that: the end- vertex of a segment is the beginning 
vertex of the next segment, each segment begins and ends with a vertex with color 1, the 
consecutive positions (vertices of the spine) of the segment are numbered by numbers 
1 , 2 ,.... 

We will color P segment by segment. For each i e {!,... ,k — 2], define the set 
P{i) = {i + l,i + 2, . . . , fc} if i is odd, and P{i) = [i + 2,i + H, . . . , fc} if i is even. 
P(i) is intended to contain all the segment colors of the segments that should contain i 
as a non-segment color. The sets P{i) will be changing as the algorithm runs. We define 
a sack S' to be a set of some pairs [x, y) of colors such that x f y. Intuitively, S will 
contain the connections between some pairs of colors such that we have to guarantee 
these connections. These connections will be realized in the last phase of the algorithm. 
We need also a variable waste to count the number of edges that we lose (do not use 
effectively) in the coloring. 

We now give a description of main steps of the algorithm. Using Lemma 4 find in T 
a system of paths S with stars with > 2 5 .(d-i) ' \^\ sdges. Set S' 0 and waste 0. 

We show how to color segment by segment, interleaving the phases of coloring a 
segment and taking dynamically next paths with stars from S w.r.t. the path order. Assume 
that we have colored all the previous segments (i, i + 1) of P for i = 2, 4, 6, . . . ,l — 2, 
and thus we have taken into account all the connections between colors 1,2,... , ^ — 1 
{I is even). (For an odd I the proof is almost identical.) From now on we describe how 
to color the next segment {I, Z + 1). We first take some next paths from S that follow the 
path order, for coloring of the whole segment {I, Z + 1). We concatenate these paths one 
by one. In some cases this concatenation will be improved later. We call a segment the 
portion of the big path composed with these concatenated paths. These paths are called 
segment paths. 
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(b) 




(c) 



Fig. 1. Improving the concatenation in Fixing positions phase, subcases. 



Fixing positions. We establish the segment and non-segment positions (vertices), 
such that if there is a link between one segment path and the first vertex u of a second 
segment path, then v is assigned a non-segment position. 

Establish the (preliminary) segment and non-segment positions (only) on the spine 
of the segment exactly as in the proof of Lemma 2 . Let p\ and P2 be any two segment 
paths, such that p\ has a link e from a vertex z of p\ to the beginning vertex vi of P2 ■ 
Let vi,V2, - ■ ■ , Ufe be the consecutive vertices of P2, and v be the last vertex on pi ((a) 
in Figure 1 ). 

If Vi is assigned a segment position, then if p2 has an odd length, then improve the 
concatenation of the segment paths, reversing p2'. glue the vertex v with Vk- If P2 is of 
even length, and Vi is assigned a segment position, then proceed as follows. If there is 
a star of size > 1 centered at Vi, then make an end- vertex (say w) of this star, the first 
vertex of path p2 : glue v with w (improving the concatenation) ((b) in Figure I). If there 
is no star centered at v\, and there is a star of size > 1 centered at V2, then let (u2, w) be 
its arbitrary edge and: 

1. If Pi and P2 are located within segment {I, I + 1), then if z has a segment position 
(color I or I + 1 ), then glue v with w (improving the concatenation), and treat the 
edge {v2, Ui) as a new star of size one ((c) in Figure 1). 

2 . If Pi is located in some previous segment than {I, I + 1), then z has been colored so 
far, and if c{z) G {i, I -f 1}, then perform the steps from the previous step 1 with 
“improving the concatenation” (as (c) in Figure 1 ). 

If there are no stars with centers at Vi and at V2, then glue V2 with v (improving the 
concatenation) and edge (u2 , Ui ) is treated as a new star of size one. Thus, V2 is assigned 
a segment position and Vi - a non-segment position. 
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We will color all the segment positions with colors I and i + 1 . It follows that there 
will he no conflicts, since for each link at least one of its end- vertices has a non-segment 
position. Throughout the algorithm we use the following easy ohservation (applicahle 
also for coloring in the proof of Lemma 2): if we permute arbitrarily non-segment colors 
on their positions, then we will still obtain a valid coloring of segment {1,1 + 1); and 
secondly: we can exchange the two segment colors 1,1 + 1 with each other, obtaining 
still a valid coloring. 

Balancing. We want to color segment {I, I + 1). We first calculate an end-position 
of this segment (position on the spine). Let N{j) = [i : j <e P{i)} for j G {i, Z + 1} 
and let W (1,1 + 1) = N{1) n N(l + 1). Let ao{p, q), for some segment positions p < q, 
denote sum of the sizes of all stars centered at positions p' £ {p,p+l, . . . , 5 } of segment 
(i, I + 1 ) such that p' = 0 mod 4, and let cj 2 (p, q) denote an analogous sum for positions 
p' G {p,p+l, . . . , q}suchthatp' = 2 mod 4. Given any position p' of segment (I, I+l) 
let(i(p') = CTo(l,p') — U 2 (l,p'). Given a position p' of segment (1,1 + l),we define a 
p" = j{p') < p' to be the smallest position index such that S{p") > if S{p') > 0, 
and such that S(p") < if (i(p') < 0. As the end of the segment we take minimum 
position index q such that 2 |kL(i, l + l)\ + 2d — 4, < q—l + cjQ{l,q—l) + cj 2 {l,(l—l)- 
We set S' ^ Su{{a,l),{b,l + 1) : a G N{l)\W{l,l + l),b e N(l + l)\W{l,l + l)}. 
Now we perform balancing: Exchange segment colors I and I + 1 with each other on all 
segment positions to the right of the position j(q — 1). Let I' G {i, I + 1} be the color of 
the first position to the right of 7(9 — 1) . Set S SU { (L, c{j{q — 1) )) }. During coloring 

segment (i, f + 1), we take into account only colors from the intersection of N(l) and 
N(l + 1). The remaining colors a G N(l) \ Z + 1) and b G N(l + 1) \ Z + 1) 
are added to S. 

Coloring. We first give intuitions behind the coloring steps. The stars with centers 
in segment vertices will be used to shorten the segment. Namely we can skip a colored 
fragment (1) — (x) — (i + 1) — (y) — (1) of the spine if we And some segment vertices 
with stars and such that some of them has color I and some - color I + 1. We just assign 
colors Z + 1 to centers of the stars, and the colors x, y to appropriate end-vertices of the 
stars. To use economically stars with centers in segment vertices, the balancing above 
guarantees roughly the same number (±(2d — 4)) of edges of the stars for the segment 
vertices with color I, and for the segment vertices with color I + 1. 

The stars with centers in non-segment vertices will be used to reduce sizes of sets 
P{i). A set P{i) can be considered as the set of segment colors: j G P{i) means that 
the color i will be to appear (as a non-segment color) in a segment, in which j is a 
segment color. But possibly we can realize the connection (i,j) before j becomes a 
segment color: before we start coloring segment (j, j + 1), if we And a star with a center 
colored by i, we will color end-vertices of the star with the greatest colors from P{i) 
and delete them from P{i). This let us reduce the number of non-segment colors we 
have to consider during coloring of segment (j,j + 1 ) and let us estimate IS"! and the 
value of waste. 

Now, we describe the coloring in more detail. Color all the segment positions with 
colors 1,1 + 1. Now, color all non-segment positions on the spine of segment (I, I + 1) 
and on end-vertices of the stars with colors from W{l,l + 1). The coloring is performed 
dynamically to avoid conflicts: take the consecutive non-segment colors from W (1,1 + 1) 
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and put them on arbitrary non-segment spine positions or end-vertices of the stars (called 
non-segment positions as well). If c € W {1,1 + 1) causes a conflict on a non-segment 
position u via a link, then color u with any other color from W{l,l + 1), avoiding the 
conflict. If c gives conflicts on each non-segment position, then put all other non-segment 
colors on that positions, and add (c, 1), (c, Z -f 1) to S, and set waste waste + 3. 

If during the improving of concatenation one glued two vertices, each adjacent to a 
link, into a vertex x, then this may cause a conflict for two non-segment colors c and c': 
one cannot color x with neither c nor c'. There may be several such conflict positions 
X within segment (i, i + 1). In this case, proceed by analogy to the above: if there is a 
conflict for c, c' on each such position x, then put on that positions other non-segment 
colors, and add (c, 1), (c, I + 1), {o', 1), {o' ,1 + 1) to sack S, and set waste waste + 6. 

Put segment colors from the sets P{i) on end-vertices of the non-segment stars: if 
i is a non-segment color on the vertex-center of a star, then assign to the end-vertices 
V of the star the greatest colors from P{i), and delete these colors from P{i). If some 
color c € P{i) gives a conflict on some v via a link, then put c on some other end-vertex 
of the star. If c gives conflicts on all end-vertices of the star, then continue assigning 
with the next greatest color in P{i), delete it from P{i) and if |T’(i)| = 1, then set 
waste maste-t-d— l.Thiscompletesdescriptionofthecoloringofsegment {1,1 + 1). 

Last step. (This step is performed after all the segments have been colored.) Realize 
connections from S: greedily use every third edge from the spine to the right of the 
end-position of the last colored segment. This completes description of the algorithm. 

In the algorithm we use the greatest colors from P{i). Note, that this trick assures 
that N{1 + 1) C N{1), so no pair {b, I + 1) will be added to S in balancing. Moreover, 
this also gives that for each color a€ {!,... ,k}, there exists at most one segment color 
I, such that {a, 1) will be added to S, and let us estimate IS"! after the algorithm stops: 

I S' I < 4/c (we skip the details). Similar ideas give: the total number of wasted edges 
= waste + (3d — 1) • |S| < (15d — 3.5)k. 

Our algorithm could fail only when it would attempt to use more than ( 2 ) edges. 
However, this is not the case, since with h{k) = 15d — 3.5, it is easy to show that 
+ h{k) ■ {k — h{k)) < ( 2 ), for every k. Thus the number of colors we have 

used is at least k — h{k). Now, recall that ^ — 1/ 2 5(d-i y ^ — 

VW\ -h{k)> - h{k) . 

The linear time implementation of the algorithm is quite natural and is omitted due 
to lack of space. The above ideas let us prove the following theorem. 

Theorem 5. Let T = (V, E) be any tree with maximum degree d = d(|R|). Given T, 
the above algorithm produees in 0{\E\)-time a complete coloring ofT with at least 

\j 2 5 .(d-t y ■ ~ k{d) colors, where h{d) = 15 • d — 3.5. 



Since < 1.582, Theorem 5 gives asymptotic 1.582-approxi- 

mation algorithm for the achromatic number problem on bounded degree trees. We can 
assume, e.g., that d(|R|) = 0(log(|R|)), or even d(|R|) = (4 • (|R| — 1))^/"^ (see 
also below). We now improve the algorithm for binary trees. A binary tree is a tree with 
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maximum degree 3. Let a 2-tendril be a path of length at most 2. We present an analogous 
lemma to Lemma 4 and a theorem based on it. 

Lemma 6. In any binary tree T = (V,E) there exists a system of paths with 2-tendrils, 
with at least | • edges from E. Moreover, each link is adjacent to the first vertex of 
some path. 

Theorem 7. There is an asymptotic 1.155-approximation linear time algorithm for the 
achromatic number problem on binary trees. Given a binary tree T, it produces a com- 
plete coloring of T with at least • ^{T) — 0(1) colors. 

Farber et al. [7] prove , that the achromatic number of a tree with m edges and maximum 
degree (4m) is at least yfn. An obvious upper bound is s/2^/m. (Therefore, we may 
assume in Theorem 5 that d < (4 • (|y| — 1))^/^.) Using the coloring algorithm from 
Theorem 7 we obtain the following improvement. 

Theorem 8. Let T be any tree with m edges and maximum degree at most 3. Then we 
have that T/(T) > 1.224 • ^/m — c,for some fixed positive constant c. 

5 Coloring Arbitrary Trees 

In this section we give a 2-approximation algorithm for trees. Let T = (V) FI) be a given 
tree, and \E\ = m. For an internal node v <E V of T, let star{v) be the set of all leaf 
edges adjacent to v. Let 4'{T) = k, and a be any complete coloring of T with k colors. 
By E' C E we denote a set of essential edges for a, which for each pair of different 
colors i,j G {I,--. ,k} contains one edge linking a vertex colored with i and a vertex 
colored with j. Notice that \E'\ = ( 2 ). 

Lemma 9. For each r < k and any internal vertices v\, . . . ,Vr € V the following 
holds I starivi) C\E'\< pj' above. 

The above lemma follows from the fact that centers of stars star{vi) can be colored 
with at most r colors, so each of the edges in yji^istar{vi) has one of its endpoints 
colored with one of these r colors. 

We say that a color I is saturated if during a course of the algorithm, I has edge 
connections with each other color, that we use in the partial complete coloring. Condition 
Cond (2) appearing below is defined in the proof of Theorem 10. 

The algorithm is as follows. Set m m, and for each e £ E, mark e “good”. Step 
(1): Set k £- max{i : (*) < m}. Select a maximum size star star{v) (v G V) and 
unmark “good” for one of its edges. Continue this marking process until Cond (1): for 
each set of r (r < k) stars in T the number of “good” edges is < • r. Set 

m <— #( all edges of E marked “good”). End Step (1). If ( 2 ) > m, then go to Step (1). 
Replace tree T with T with only “good” edges. 

We have now two cases. If #( leaf edges of T) < |m, then compute a system of 
paths S in T, and optimally color S by Lemma 3 . 

In the other case, if #( leaf edges of T) > |m, proceed as follows. Sort all stars 
{star{v) : v G U} of T w.r.t. their sizes. Let si > §2 > . . . > be these sizes. Find the 
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smallest integer i such that Sj < —i. We consider two cases. If i > — 1, colorthe 

first i largest stars with colors: center of j-th star {j = 1,2, , i) is colored with 
color j,andits -^—j sons are colored with the consecutive colors j + + 2 , . . . , 

Else, if i < — 1, use the procedure from the case of “i > — 1” for the first i — 1 

largest stars, saturating colors 1,2,... , i — 1: center of j-th star is colored with color j 
and its sons with colors j + 1, j + 2, .. . , for j = 1, 2, . . . , z — 1. The remaining 

colors i,i + l, . . . , will be saturated using the fact that now we have many stars with 

small sizes (< We namely perform the following steps, called the final saturation 
steps. 

Partition all the stars into two sets B\ and B 2 of all stars with centers at odd distance 
from the root of T and resp. at even distance from the root. Set j i. Now, we try to 
saturate color j. While Cond (2) do: take for Bp one of the sets B\, B 2 with greater 
number of edges; keep on picking the consecutive maximum size stars from Bp until 
color j is saturated; delete these stars from Bp. Set j j + 1. End While. This completes 
description of the algorithm. 

Theorem 10. The above algorithm is a 2-approximation algorithm for the achromatie 
number problem on a tree. 



Proof. Note, that if condition Cond (1) holds for any set of r (r < k) stars of the greatest 
size, then it also holds for any set of r stars. Thus it is possible to perform procedure Step 
(1) in polynomial time. To prove the approximation ratio we show that the algorithm 
finds a complete coloring with at least colors, which is half of optimal number of 
colors. If #( leaf edges of T) < |m, then by Lemma 1, S has at least edges, so the 
resulting partial complete coloring uses > colors. We prove this also for case of 
leaf edges of T) > |m. We have many edges in the stars. We show that the algorithm 
colors the stars with > colors. By procedure Step (1) and Lemma 9, the number 
of edges in f — 1 stars we have used before the final saturation steps is at most A ~ 
( 72 v^-i)+(y 2 vTii-z+i) . ^ ^ Defining ^ ^ (i - 1) + ^ • (i - 1) , 

we have A = A' . The first element of A' can be considered as the number of edges we 
have used effectively in the coloring, while the second element of this sum can be 
considered as the number of edges we have lost (before final saturation steps). We prove 
that the final saturation steps will saturate all colors z, z + 1, . . . , . Cond (2) is defined 

to be true iff: e\ = #( edges in B\) > ^ — j or 62 = #( edges in B 2 ) > ^ — j. 
So the procedure stops if ei + 62 < 2 • — j — 1) = S. The formula for A 

together with A! can be generalized also to the case of coloring in case “z < — 1” 

together with “While Cond (2)” procedure. (We just glue all centers of the stars used 
to saturate color j in “While Cond (2)” into one star, for j = z, z + 1, . . . .) So after 
the above procedure, when Cond (2) is not true any more, we have used so far at most 

• (j — 1) edges. Thus, if Cond (2) will not be true, then 



^ _ (v^y7i-l)+(v^v7i-j+l) 
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^m — A < It is easy to show that the smallest j, such that ^m — A < d, is greater than 

the number of colors to saturate. Obviously, the algorithm has polynomial running 
time. □ 

6 Coloring Large Girth Graphs 

In this section we prove that for graphs with girth at least six, the achromatic number 
can be approximated in polynomial time with ratio 

We define in a given graph G = (V,E) a subset M C ill to be an independent 
matching if no two edges in M have a common vertex and there is no edge m E\M 
adjacent to more than one edge in M. Let G = {V,E)h&a given graph, with \V\ = n, 
\E\ = m, and with girth at least six. For a given edge e E E, let N{e) denote the set of 
all edges at distance at most one from e. We now describe the algorithm of Chaudhary 
and Vishwanathan [5] with our modification. The steps below are performed for all 
parameters / = 1, 2, . . . , m. 

1. Set I i — 0 , i i — 1. 

2. Choose any edge Ci e E and set J J U {ci}, E E\ N{ei). 

3. If FI 7 ^ 0 then set i i + 1, go to step 2. 

4. If|/| > / then output a partial complete coloring using edges in I, else partition each 
N{ei) into two trees by removing the edge e^. Then use the algorithm of Section 5 
to produce a partial complete coloring for each such tree and output the largest size 
coloring. 



Theorem 11. Let G be a graph with n vertices and with girth at least six. The above 
algorithm is a (y/2 + e) \/ ][' (G) -approximation algorithm for the achromatic number 
problem on G (for any e > Q). Moreover {s/2 + e)s/E{G) = 



Proof. I is an independent matching and having any independent matching of size ( 2 ), 
we can generate a partial complete coloring of size 1. In this case we have a coloring of 
size > In Ihs other case, since girth > 6, removing the edge Ci from N{ei), vertex 
set of N{ei) can be partitioned into two trees. Consider a maximum coloring c of G 
and let E' denote a set of essential edges for a. If | J| < /, then at least one of the sets 
N{ei), say N{eif, contains > {\E'\/ f) essential edges. N{ei/} consists of two trees, 
so one of them contains > — I essential edges. So we can set m = ^ in 

the proof of Theorem 10 (see Step (1) of algorithm in Section 5), and from this proof 

the tree can be colored with at least c = colors. Thus the number of colors is c = 
— 1) — 2/, where E = 'P{G), and \E'\ =(f). For /, such that c = \/2/» 
^^^(^_l)_2/ = So, c = ^ IQE + l- 1. It can be shown 

that c> c' ■ s/^, for a constant c' = and e > 0. Since c > c'y/^ = 
the approximation ratio is (y/2 + t)s/E. Any n-vertex graph with girth at least g has at 
mostn[n^] edges [2], and!?' = 0(y/|^), so the approximationratio is □ 
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7 Open Problems 

We suggest the following open problems: (1) Improving the approximation ratios. (2) Is 
there an O(v^) -approximation algorithm for the problem on other classes of graphs? 
(3) Is the achromatic number problem on trees Max SNP-hard? 
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Abstract. Given an undirected multigraph G = (V,E) and two positive integers 
£ and k, we consider the problem of augmenting G by the smallest number of 
new edges to obtain an f-edge-connected and fc-vertex-connected multigraph. In 
this paper, we show that an (fc — 1) -vertex-connected multigraph G (fc > 4) can 
be made f-edge-connected and fc-vertex-connected by adding at most 2£ surplus 
edges over the optimum, in 0(min{fc, y£n}kn^ + n^) time, where n = \V\. 



1 Introduction 

The problem of augmenting a graph by adding the smallest number of new edges to meet 
edge-connectivity or vertex-connectivity requirement has been extensively studied as an important 
subject in network design, and many efficient algorithms have been developed so far. However, 
it was only very recent to have algorithms for augmenting both edge-connectivity and vertex- 
connectivity simultaneously (see [9,10,1 1,12] for those results). 

Let G = {V,E) stand for an undirected multigraph with a set V of vertices and a set E of 
edges. We denote the number of vertices by n, and the number of pairs of adjacent vertices by m. 
The local edge-connectivity Aq (x, y) (resp., the local vertex-connectivity kq (x, y)) is defined to 
be the maximum £ (resp., fc) such that there are £ edge disjoint (resp., fc internally vertex disjoint) 
paths between x and y in G (where at most one edge between x and y is allowed to be in the set of 
internally vertex disjoint paths). The edge-connectivity and vertex-connectivity of G are defined 
by A(G) = min{AG(x, y) \ x,y eV,x ^ y} and k(G) = min{KG(x, y) \ x,y eV,x y}. 
We call a multigraph G £-edge-connected if A(G) > £. Analogously, G is k-vertex-connected if 
Kg {x, y) > k for all x, t/ € U. The edge-connectivity augmentation problem (resp., the vertex- 
connectivity augmentation problem) asks to augment G by the smallest number of new edges so 
that the resulting multigraph G' becomes f-edge-connected (resp., fc-vertex-connected). 

As to the edge-connectivity augmentation problem, Watanabe and Nakamura [20] first proved 
that the problem can be solved in polynomial time for any given integer £. However, some special 
cases of this problem are NP-hard; for example, augmenting G to attain f-edge-connectivity while 
preserving simplicity of the given graph [1,15]. 

As to the vertex-connectivity augmentation problem, the problem of making a (fc — 1) -vertex- 
connected multigraph fc-vertex-connected was shown to be polynomially solvable for fc = 2 
[3] and for fc = 3 [21]. It was later found out that, for fc 6 {2, 3, 4}, the vertex-connectivity 
augmentation problem can be solved in polynomial time in [3,7] (for fc = 2), [6,21] (for fc = 3), 
and [8] (for fc = 4), even if the input multigraph G is not necessarily (fc — 1) -vertex-connected. 
However, whether there is a polynomial time algorithm for an arbitrary constant fc was still an 
open question (even if G is (fc — 1) -vertex-connected). Recently Jordan presented an 0(n®) time 
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approximation algorithm for fc > 4 [13,14] in which the differenee between the number of new 
edges added by the algorithm and the optimal value is at most (fc — 2) /2. 

The problem of augmenting both edge- and vertex-eonnectivities has been studied in [9,10, 
ll,12].For two integers i and fc, we say that G is {£,k)-connected if G is f-edge-connected 
and fc-vertex-eonnected. The edge-and-vertex-connectivity augmentation problem, denoted by 
EVAP(f , fc), asks to augment G by adding the smallest number of new edges so that the resulting 
multigraph G" beeomes {£, fc)-eonneeted, where f > fc is assumed without loss of generality. 
Reeently, the authors proved that EVAP(f, 2) ean be solved in 0{{nm + logn) logn) time 
[10], and that EVAP(f, 3) ean be solved in polynomial time (in partieular, 0(n"‘) time if an input 
graph is 2-vertex-eonnected) for any fixed integer f [ 1 1 , 1 2] . However, for arbitrary integers £ and 
fc, whether there is a polynomial time algorithm for EVAP(f, fc) in which the differenee between 
the number of new edges added by the algorithm and the optimal value is 0{£) was still an open 
question. 

In this paper, we consider problem EVAP(f, fc) for a (fc — 1) -vertex-connected graph. One 
may consider an algorithm that first meets one of the edge-connectivity and vertex-connectivity 
requirements, and then meets the other requirement. It is easily seen that this sequential algorithm 
does not lead to an optimal solution. In this paper, we prepare an algorithm that augments the 
graph by considering both edge-eonnectivity and vertex-eonnectivity in eaeh step. However, still in 
this ease, the resulting multigraph may not be optimally augmented from the original multigraph. 
To evaluate approximation error, we first present a lower bound on the number of edges that is 
neeessary to make a given multigraph G {£, fc)-eonneeted, and then show that the lower bound 
plus 2£ edges suffice if the input graph is (fc — 1) -vertex-connected with fc > 4. The task of 
construeting such a set of new edges can be done in 0(min{fc, ^yn}kn^ + n^) time. 

2 Preliminaries 

2.1 Definitions 

For a multigraph G = {V, E), an edge with end vertiees u and v is denoted by (u, v). Given 
two disjoint subsets of vertiees X, Y c V, we denote by Eg{X,Y) the set of edges connecting 
a vertex in X and a vertex in Y, and denote cg{X, Y) = \Eg{X, y)|. A singleton set {x} is 
also denoted x. In particular, Eg {u, v) is the set of multiple edges with end vertices u and v and 
cg{u, v) = \Eg{u, u)| denotes its multiplicity. Given a multigraph G = {V, E), its vertex set V 
and edge set E may be denoted by V (G) and E(G), respectively. For a subset V' £2 V (resp., 
E' C fj) in G, G[y'] (resp., G[f?'])denotesthesubgraphindueedby y' (resp., G[f?'j = {V,E')). 
For V' dV (resp., E' C E), we denote subgraph G[V — V'\ (resp., G[E — E'\) also by G — y' 
(resp., G — E'). For E' C E, we denote V{G[E']) by V[E']. For an edge set E with F nf? = 0, 
we denote the augmented graph G = (y, FUF)byG + F.A partition Ai , • • • , At of a vertex 
set y is a family of nonempty disjoint subsets Xi of V whose union is V, and a subpartition of 
y is a partition of a subset of V . A cut is defined to be a subset X of V with % ^ X , and 
the size of a eut X is defined by cq (A, y — A) , whieh may also be written as cg ( A) . A subset 
A interseets another subset Y if none of subsets X C\Y , X — Y and A — A is empty. A family 
X of subsets Ai, • • • , Ap is ealled laminar if no two subsets in X interseet each other (however, 
possibly Xi C Xj for some Xi,Xj e X). 

For a subset A of y, a vertex u 6 y — A is ealled a neighbor of A if it is adjacent to some 
vertex u € A, and the set of all neighbors of A is denoted by Eg{X). A maximal connected 
subgraph G' in a multigraph G is ealled a component of G, and the number of components in G is 
denoted by p{G).A disconneeting set of G is defined as a cut F of y such that p{G — S) > p(G) 
holds and no 5" C S' has this property. Let G denote the simple graph obtained from G by 
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replacing multiple edges in Eg {u, v) by a single edge (u, v) for all u,v E V. A component G' 
of G with \V (G")| > 3 always has a disconnecting set unless G' is a complete graph. If G is 
connected and contains a disconnecting set, then a disconnecting set of the minimum size is called 
a minimum disconnecting set, whose size is equal to k(G) . A cut T C V is called tight if Eg (T) 
is a minimum disconnecting set in G. A tight set D is called minimal if no proper subset D' of D 
is tight (hence, the induced subgraph G[D\ is connected). We denote a family of all minimal tight 
sets in G by V{G), and denote the maximum number of pairwise disjoint minimal tight sets by 
t{G). For a vertex set S in G, we call the components in G — S' the S-components, and denote 
the family of all S-components by C(G — S). Note that the vertex set S is a disconnecting set in 
a connected multigraph G if and only if |C(G — S)| > 2. Clearly, for a minimum disconnecting 
set S, every S-component is tight, and the union of two or more (but not all) S-components is 
also tight. 

Lemma 1. [13] Let G be k-vertex-connected. Ift(G) > fc + 1, then any two cuts X,Y E TdiG) 
are pairwise disjoint (i.e., t{G) = \V{G)\). □ 

Lemma 2. [18] Let G E E be a cycle in a k-vertex-connected graph G = (V,E) such that 
k(G — e) = fc — 1 holds for every e E G. Then there exists a vertex v E V[G\ with ir’G(u) | = fc. 

□ 

We call a disconnecting set S a shredder if |C(G — S) | > 3. A tight set T is called a superleaf, if 
T contains exactly one cut in T>{G) and no T' 3 T satisfies this property. The following lemmas 
summarize some properties of superleaves. 

Lemma 3. [2] Let G = {V, E) be a connected multigraph with t(G) > k(G) + 3. Then every 
two superleaves are pairwise disjoint. Hence, a superleaf is disjoint from all other cuts in T>(G) 
except for the cut in T>{G) contained in it. □ 



Lemma 4. Let S be a minimum shredder and D E E>(G) be a minimal tight set in a connected 
graph G = {V, E). If D is contained in a cut T E C(G — S) and no cut in T>(G) other than D 
is contained in T, then T is the superleaf with TED. □ 

Lemma 5. Let S be a minimum shredder in a connected graph G = {V, E). If p(G — S) > 
k(G) + 1 holds, then every superleaf Q in G satisfies Q n S = 0. □ 

Lemma 6. Let G be a connected multigraph with t(G) > n{G) + 3. Then t(G -f e) < t(G) — 1 
holds for an edge e which connects two pairwise disjoint cuts Di,D 2 E T>(G) with Eg{D 2 ) fl 
Qi = 0, where Qi is the superleaf containing Di. □ 



2.2 Preserving Edge-Connectivity 

Given a multigraph G = {V, E), a designated vertex s € C, and a vertex u E Eg{s), we 
construct a graph G' = {V, E') by deleting one edge (s, u) from Eg{s, u), and adding a new 
edge to Eg{s, v) with v eV — s. We say that G' is obtained from G by shifting (s, u) to (s, v). 

Let G = {V, E) satisfy \g (x, y) > £ for all pairs x,y E V — s. For an edge e = (s, v) 
with u 6 C — s, if there is a pair x,y E V — s satisfying Xg-c (x, y) < I, then there is a unique 
cut A C C — s with cg{X) = £ and v E X such that all cuts X' E X with v E X' satisfies 
cg(X') > £. We call such X X-critical with respect to u € Eg{s). 




Augmenting a (fc — l)-Vertex-Connected Multigraph 417 



Theorem 7. Let G = (V,E) be a multigraph with a designated vertex s 6 V such that 
\g(x, y) > i for all x,y € V — s. Let X be a X-critical cut with respect to v E rds) if 
any, X = V otherwise. Then for any v' EX eV — s, the shifted multigraph G — (s, u) + (s, u') 
satisfies Xg-{s,v)+(s,v') {x, y) > £ for all x,y E V — s. □ 

Given a multigraph G = {V,E),a. designated vertex s eV, vertices u,v E Eg (s) (possibly 
u = v) and a nonnegative integer S < min { cg ( s, w), cg(s, u)}, we constmct graph G' = 
(y, E') by deleting 5 edges from Eg{s, u) and Eg{s, v), respectively, and adding new S edges 
to Eg{u, v). We say that G' is obtained from G by splitting d pairs of edges (s, u) and (s, v).A 
sequence of splittings is complete if the resulting graph G' does not have any neighbor of s. 

Let G = {V,E) satisfy Xg(x, y) > t for all pairs x,y E V — s.Apan {(s, u), (s, u)} of two 
edges in Eg{s) is called X-splittable, if the multigraph G' resulting from splitting edges (s, u) 
and (s, v) satisfies Ag' {x, y) > £ for all pairs x,y E V — s. The following theorem is proven by 
Lovasz [17, Problem 6.53], 

Theorem 8. [5,17] Let G = {V, E) be a multigraph with a designated vertex s 6 with even 
cg(s), and £ > 2 be an integer such that Xg(x, y) > £ for all x,y E V — s. Then for each 
u E Eg{s) there is a vertex v E Eg{s) such that {(s, u), {s, u)} is X-splittable. □ 

Repeating the splitting in this theorem, we see that, if cg (s) is even, there always exists a 
complete splitting at s such that the resulting graph G' satisfies Ag'-s(x, y) > £ for every pair 
of x,y E R — s. If is shown in [19] thaf such a complete splitting at s can be computed in 
0((m + n log n)n log n) time. 

We call a cut A EV — s dangerous if cg{X) < £ + 1 holds. Note that {(s, u), (s, u)} is 
not A-splittable if and only if there is a dangerous cut A C R — s with {u, u} C A. 

Theorem 9. [5] Let G = (V,E) be a multigraph with a designated vertex s EV, and £ >2 be 
an integer such that Ag {x, y) > £ for all pairs x,y E V — s. Let u E Eg (s). Then there are at 
most two maximal dangerous cuts X with u E X ■, i.e., no cut X' D A with u E X' is dangerous. 
In particular, if there are exactly two maximal dangerous cuts Ai and X 2 with u E Ai n X 2 , 
then cg(Ai U A 2 ) = £ + 2, cg(Ai n A 2 ) = £, cg(Ai - A 2 ) = cg(A 2 - Ai) = £, and 
Cg ( s,Ai nA 2 ) = 1 hold. □ 



Corollary 10. Let G = {V, E) be a multigraph satisfying the assumption of Theorem 9 and 
A(G — s) >k. For each vertex u E Eg{s), |{u € rG(s)|{(s, u), {s, u)} is not X-splittable}\ < 
\{e E Eg{s)\ {{s,u),e} is not X-splittable}\ < £ + 2 — k. □ 

Conversely, we say that G' is obtained from G by hooking up an edge {u, v), if we construct 
G' by replacing (u, v) with two edges (s, u) and (s, v) in G. 

2.3 Preserving Vertex-Connectivity 

Let G = (V, E) denote a multigraph with s 6 V and | V| > fc + 2 such that kg{x, y) >k holds 
for all pairs x,y E V — s. A pair {(s, u), (s, u)} of two edges in Eg{s) is called n-splittable, if 
the multigraph G' resulting from splitting edges (s, u) and (s, v) satisfies kg> (x, y) >k for all 
pairs x,y E V — s. If G — s is nof fc-verfex-connected, then k(G — s) = fc — 1 holds, since G 
satisfies kg-s{x, y) > kg{x, y) — 1 > k — 1 for all pairs x,y E V — s. Hence if {(s, u), (s, u)} 
is nof K-splittable, then the resulting graph G' has a disconnecting set S' C V — s with | S| = fc — 1 
andp(G'-S) = 2,andacutT € C(G' - S) with T C V - s, {u,v}nT f 0, {m,u} C TUS 
and Eg' (s, T) = 0. The following theorem for a shredder is given by [2, Lemma 5.6]. 
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Theorem 11. [2] Let G = {V, E) be a multigraph with a designated vertex s 6 and k > 2 
be an integer such that kg(x, y) > k for all pairs x,y € V — s, n{G — s) = k — 1, and 
t(G — s) > fc + 2. Let Qi, Q 2 , and Q 3 be three distinct superleaves in G — s such that 
rc-siQi) n Q2 = 0 = /g-s(Qi) n Qs and ra-siQi) is not a shredder in G — s. Then, for 
X € Tg(s) n Di, y 6 Eg{s) Pi D 2 , and z € Eg{s) Pi D 3 , where Di C Qi, i = 1, 2, 3, denote 
the cuts in V{G — s), at least one of {(s, ®), (s, y)}, {(s, y), (s, z)}, and {(s, z), {s, x)} is n- 
splittable. Moreover, t(G') = t{G) — 2 holds for the resulting graph G' (note that rG{s)C\Di f 0 
holds for i = 1 , 2, 3 since otherwise kg (x, y) > k cannot hold for all pairs x, y € V — s). □ 

The following result is a slight generalization of [2, Lemma 5.7] in which a graph Gis assumed 
to satisfy kg(x, y) > k(> 2) for all pairs x,y E V — s, but removal of any edge incident to s 
from G violates this property. 

Theorem 12. Let G = (V, E) be a multigraph with a designated vertex s € and k > 2 be an 
integer such that kg (x, y) >k for all pairs x,y E V — s, n(G — s) = k — 1 and t(G — s) > 
max{2fe — 2, fc + 2}. Let Qi eV — s be an arbitrary superleaf such that S = rG-s(Qi) is a 
shredder in G — s. If G — s has a cut T E C((G — s) — S) — {Qi} with cg(s, T) > 2, then 
{(s, ®), (s, j/)} is K-splittable for any pair {x, y} such thatx E Eg(s) PiQi and y E Eg (s) PiT. 

□ 

2.4 Lower Bound 

For a multigraph G and a fixed integer f > fc > 4, let opt(G) denote the optimal value of 
EVAP(f, fc) in G, i.e., the minimum size |F| of a set E of new edges to obtain an (£, fc)-connected 
graph G + F. In this section, we derive two types of lower bounds, a(G) and /3(G), on opt(G). 

Let A be a cut in G. To make G f -edge-connected and fc- vertex-connected, it is necessary to 
add at least max{f — cq (A) , 0} edges between X and y — , or at least max{fc — |Fg (X) | , 0} 

edges between X and V — X — Eg(X) ifV — X — Eg(X) f 0. Given a subpartition X = 
{Xi, • • • , Xp, Xp+i, ■■■,X^} of y, where V-Xi- rc(Xi) 7^ 0 holds for i = p + 1, • • • , g, 
we can sum up “deficiencies” max{f — co(Xi), 0}, f = 1, • • • ,p, and max{fc — |FG(Xi)|, 0}, 
f = p+ l,---,g.As adding one edge to G contributes to the deficiency of at most two cuts in X, 
we need at least \a(G) /2] new edges to make G (£, fc)-connected, where 



r(G) = 



all subpartitions X 




■CG(xf)+ ^ (fc-irG(xoi) 



( 2 . 1 ) 



i=p+l 



and the maximum is taken over all subpartitions X = {Xi , • • • , Xp, Xp+i, • • • , Xq} of V with 
V-Xi-EG(Xi)f0, i=p+l,---,q. 

We now consider another case in which different type of new edges become necessary. For a 
disconnecting set S' of G with |S| < fc — 1, let Ti, • • • , T,j denote all the components in G — S, 
where q = p(G — S). To make G fc-vertex-connected, a new edge set E must be added to G so 
that all Ti form a single connected component in (G + E) — S. For this, it is necessary to add at 
least p(G — S) — 1 edges to connect all components in G — S. Define 



/3(G) 



max 

all disconnecting sets 
S in G with |S| < fc 



p(G-s) y 



Thus at least /3(G) — 1 new edges are necessary to make G (£, fc)-connected. 



( 2 . 2 ) 
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Lemma 13. (Lower Bound) For a given multigraph G, let 

7(G) = max{[a(G)/2],/3(G) - 1}. 

Then 7(G) < opt(G) holds, where opt(G) denotes the minimum number of edges augmented to 
make G if, k)-connected. □ 

Remark: Both of a(G) and /?(G) can be computed in polynomial time. The algorithm of compu- 
ting a(G) will be mentioned in Section 4. It can be seen that /?(G) is computable in polynomial 
time, since finding all minimum shredders in the given graph can be done in polynomial time 
[ 2 ], □ 



Based on this, we shall prove the next result in this paper. 

Theorem 14. Let G be a (k — V)-vertex-connected multigraph G with fc > 4. Then, for any 
integer £ > k, 7(G) < opt(G) < 7(G) + 2£ holds, and a feasible solution F of¥NAP(£, fc) 
with 7(G) < |F| < 7(G) + 21 can be found in 0(min{fc, yfi\krf' -f rf) time, where n is the 
number of vertices in G. □ 

3 An Algorithm for EVAP(.£, k) 



In this section, we present a polynomial time algorithm, called EV-AUG, for finding a nearly 
optimal solution to EVAP(f, fc) for a given (fc — 1 (-vertex-connected graph and a given integer 
£ > k > 4. The algorithm EV-AUG consists of the following three major steps. In each step, we 
also give some properties to verify its correctness. The proof for these properties will be given in 
the subsequent sections. 

Algorithm EV-AUG 

Input: An undirected multigraph G = (V,E) with |U| >fc + l,K(G) = fc — 1, and an integer 
f > fc > 4. 

Output: A set ofnew edges F with I F| < opt(G) + 2f such that G* = G+F satisfies A(G*) > £ 
and k(G*) > fc. 

Step I (Addition of vertex s and associated edges): If t(G) < 2£ + 1 holds, then we can 
see from Lemma 2 that G can be made fc-vertex-connected by adding a new edge set Fq with 
1^0 1 < 2£. Moreover, G can be made f-edge-connected by adding a new edge set Fg with 

l^o^l < \<^{G)/2~\, by using the algorithm AUGMENT in [19]. Output Fq U Fg as an solution, 

which satisfies |Fo| + |Fo'| < opt{G) + 2£. 

If t{G) > 2£ + 2 holds, then Lemma 1 says that every two cuts in V{G) are pairwise disjoint. 
Then add a new vertex s together with a set Fi of edges between s and V so that the resulting 
graph Gi = (U U {s}, F U Fi) satisfies 

CGi {X) > £ for all cuts A with 0 7^ A C U, (3.1) 

CGi (s, F) > 1 for all minimal tight sets D 6 F(G), (3.2) 

where |Fi| is minimum subject to (3.1) and (3.2) (Section 4 describes how to find such a minimum 



Property 15. The above set of edges Fi satisfies |Fi| = a{G). 



□ 
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If CGi (s) is odd, then add one edge e = (s, w) to Fi for a vertex w arbitrarily chosen from V. 
Denote the resulting graph again by Gi. After setting G' := Gi and G' := Gi — s, we go to 
Step II. 

Step II (Edge-splitting): While t(G') > 2f + 2 holds (hence superleaves in G' are pairwise 
disjoint by Lemma 3), repeat the following procedure (A) or (B); if the condition 

/?(^)-l> r^(G')/2] (>f+l) (3.3) 

holds, then procedure (A) else procedure (B) (If t(G') < 2£ + 1, then go to Step III). 

Procedure (A) 

Choose a minimum disconnecting set S* in G' satisfying p(G' — S*) = P{G'). 

(Case-I) G' has a cut T* 6 C(G' — S*) with cg' (s, T*) >2. Then the next property holds. 

Property 16. In Case-I, there is a pair of two edges {(s, u), (s, u)} such that (i) u and v are 
contained in distinct cuts in C(G' — S*), (ii) {(s, u), (s, u)} is A-splittable and K-splittable, and 
(iii) the multigraph G" := G' — {(s, u), (s, u)} + (u, v) resulting from splitting edges (s, u) 
and (s, v) in G' satisfies /?(G" — s) = P(G') — 1. □ 

Then split the two edges {(s, u), (s, u)}inProperty 16. SetG' := G' — {(s, u), (s, u)} +{(u, u)}, 
G' := G' — s, and go to Step II. 

(Case-2) Every cut T* e C(G' — S*) satisfies cq' (s, T*) = 1. 

Property 1 7. In Case-2, if one of the following cases (1) - (4) holds, then there is a A-splittable 
and K-splittable pair of two edges incident to s after hooking up at most one split edge and shifting 
at most one edge incident to s. Otherwise we can make G (f, fc)-connected by adding p{G) — 1 
edges. 

(1) There is an edge e = (s, v) with v € S* . 

(2) There is a split edge e = (u, v) ^ E with u E S* and v E Ti e C(G' — S*). 

(3) There is a split edge e = {u, v) ^ E with {u, v} ETi E C{G' — S*) such that 
p(G' - S*) = p((G' - e) - S*) holds. 

(4) There is a split edge e = {u, v) ^ E with u,v E S*. □ 

IfG' satisfies one ofthe conditions (1)- (4) in Property 17, then split the two edges {(s, x), (s, y)} 
defined in Property 17: set G' := G' — {(s, x), (s, y)} -f {(x, y)}, G' := G' — s, and return 
to Step II. Otherwise make G (f, fc)-connected by augmenting /3(G) — 1 edges, according to 
Property 17, and halt after outputting the set of edges added to G as an optimal solution. 

Property 18. Each iteration of procedure (A) decreases P{G') at least by one, and does not 
increase cg' ( s). □ 

Procedure (B) 

Choose an arbitrary superleaf Qi in G'. Then there is an edge (s, xp) with xi E Di E V{G') 
such that Oi C Qi. Let S = r-pp{Qi). 

(Case-3) S' is a shredder in G'. 

Property 19. In Case-3, there is an edge (s, X 2 ) with X 2 E D 2 E V{G') such that O 2 C T 2 € 
C(G' — S) — {Qi} holds and {(s, xi), (s, X2)} is A-splittable and K-splittable. □ 

After splitting two edges (s, xi) and (s, X2) in Property 19, set G' := G' — {(s, xi), (s, X2)} + 
{(xi, X2)} and G' := G' — s, and return to Step II. 

(Case-4) S is not a shredder in G'. 
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Property 20. In Case-4, in addition to Qi, there are two distinct superleaves Q 2 and Q 3 in G' 
such that Q 2 nS = 9 = QsPiS holds, and each of {(s, xi), (s, X2)}, {(s, * 2 ), (s, *3)}, and 
(s,®i)}is A-splittable, where X 2 € -D 2 n Tq/ ( s) and X 3 € D 3 n for cuts Di 
with Di e V{G') and Di C Qi for i = 2, 3. □ 

Take the two edges {s,X2) and ( 5 , 3 : 3 ) in Property 20. Then Theorem 11 says that at least 
one of {(s, a;i), (s, X2)}, {(s, X2), (s, 3 : 3 )}, and {(s, X3), (s, 3;i)} (say, {(s, 3 : 1 ), (s, X2)}) is 
K-splittable. After splitting the two edges (s, xi) and (s, X2), set G' := G' — {(s, xi), (s, X2)} + 
{(xi, 3 : 2 )} and G' := G' — s, and return to Step II. 

Property 21 . Each iteration of procedure (B) decreases t{G') and cg' (s) at least by one, respec- 
tively. □ 

Step III (Edge augmentation): Let G 3 = (y U s, i? U F 3 U F 3 ) denote the current multigraph, 
where F3 denotes the set of split edges, and F3 := Ea^ (s, V). Let G3 := G3 — s. 

Property 22. In Step III, t{G3) < 2f + 1 and |p 3 | + IF 3 I /2 = |"a(G)/2]hold, and all cuts 
X CV satisfies CG 3 (A) > 1. □ 

Then find a complete edge-splitting at s in G 3 according to Theorem 8 to obtain G| = (V) if U 
F 3 ) (ignoring the isolated vertex s) with A(G|) > Note that |F 3 *| = |"a(G)/2] holds from 
Property 22. Moreover, by Lemma 2 and Property 22, G| can be made fc-vertex-connected by 
adding a set Fl of new edges with | F 4 | < 2f . Output F 3 U iff as an solution, where its size 
satisfies |F 3 *| + |iff | < |"a(G)/ 2 ] + 2 i < opt{G) + 2£. □ 



4 Justification of Step I 



Proof of Property 15: We show that a set Fi of edges satisfying Property 15 can be found by 
applying the algorithm ADD-EDGE [11], as follows. 

Let G'l = (y U s, if U Ff) be a multigraph such that F[ is minimal subject to (3.1) and (3.2). 
Note that |F[| > a(G) holds, since otherwise G( violates (3.1) or (3.2). 

IfG'i — e violates(3.1) foranedge (s, u) € (s), then there isaA-critical cut A„ C V with 

respect to u. If Gf — e violates (3.2) for an edge (s, u) € Fq/^ (s), then there is a cut F 6 F(G) 
satisfying cq'^ {s,D) = 1 . We call such a cut D n-critical. Note that every A-critical cut X satisfies 
(s, X) = i — cg{X) and every K-critical cut X satisfies Cg'_^ (s, X) = k — |Fg(A)| (= 1). 
Hence if there is a subpartition X of V satisfying Fq/^ (s) C UxexX such that every cut X 6 
X is A-critical or K-critical, then |F[| = a(G) holds, since |Fi'| = ~ CG{Xi))+ 

X^i=p+i(^ “ I^G(2fi)l) a(G) holds for X = {Xi, ■ ■ ■ ,Xq} from the maximality of a(G). 

Otherwise there is a pair of two cuts X and Y with X C\Y ^ $ such that X is A-critical, Y is 
K-critical, and (Y — '^{Xi\Xi is A-critical}) OFq/^ (s) ^ 0 holds, because it can be easily seen 
that a family of all A-critical cuts is laminar and every two cuts in F(G) are pairwise disjoint. 
Then we can replace and remove some edges in Eg' (s, V) to decrease the number of such pairs 
to zero by applying the algorithm ADD-EDGE [11], since the algorithm ADD-EDGE depends on 
the property that every two cuts in F(G) are pairwise disjoint and every K-critical cut D satisfies 
Cg( (s, F) = 1. Thus, we can find a set Fi of edges satisfying Property 15. □ 
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5 Justification of Step II 

In Step II, t{G') >21+2 holds and G' satisfies (3.1) and (3.2). First we consider the 
correctness of Procedure (A). For this, we prove Properties 16 - 18. Since condition (3.3) implies 
p(G' — S*) >f+l >fc + l. Lemma 5 tells that 

every superleaf Q in G' satisfies Q n S* = 0. (5.1) 

Proof of Property 16: Let Qi be a superleafinG' satisfying L^(Qi) = S'*, and choose a vertex 
xi € Qi- Note that such Qi exists, since otherwise Lemma 4 and (5.1) imply that every cut in 
C(G'—S*) contains atleasttwocuts in "D(G') andhence/?(G') < |"t(G')/2] holds, contradicting 
condition (3.3). Let Ni be a set of vertices v € Fq' (s) n (L* — S*) such that {(s, xi), (s, u)} 
is A-splittable. Since t(G') > 2f + 2 and (5.1) imply |/g/(s) n (L* — S*)| > 2f + 2, we have 
|iVi I > 2f + 2^{£ + 2-(fc-l)} = f + fc-lby Corollary 10. Let C*(G' - S*) be a family 
of cuts T 6 C(G' — S*) — Qi which contains a vertex in Ni. 

(1) If there is a cut T e C*(G' — S*) that satisfies cg'{s,T) > 2, then (s,u)} 

with V E T n Ni is A-splittable (from the definition of Ni) and K-splittable (from Theorem 12). 
If G' has another disconnecting set S' S* satisfying p{G' — S') = P{G'), then it is not 
difficult to see from condition (3.3) that G' has exactly one disconnecting set S' S* satisfying 
p(G' — S') = p{G'), and that there is a cut Ti 6 C(G' — S*) which contains {p{G') — 1) 
S'-components, and moreover, by Corollary 10 and P{G') — 1 > f + 1, Ti contains a cut 
T 6 C(G' — S') containing a vertex vi E Ni. 

(2) Otherwise every T 6 C*{G' — S*) satisfies caps, T) = 1. By applying Lemma 4 to 
shredder S* in G', we see that every T 6 C*{G' — S*) is a superleaf in G', which contains exactly 
one vertex in Ni . Now, by assumption, G' has a cut T* E C (G' — S*) satisfying cq' (s,T*) > 2, 
and T* p C*(G' — S*) and T* n Ni = 0 hold. We choose one edge (s, * 2 ) such that X 2 E T* 
(clearly, X 2 p Ni). Then there is an edge (s, v) with v E Ni such that {(s, X 2 ), (s, u)} is A- 
splittable, because | Ai | > £ + k — 1 holds and Corollary 10 says that there are at most (£ + 2) — 
(k — 1) neighbors u of s such that {(s, X 2 ), (s, u)} is not A-splittable. By T* n = 0, vertices 
X 2 E T* and v E Ni are contained in different S'*-components in G'. Thus, {(s, X 2 ), (s, u)} is 
K-splittable by Theorem 12. 

Finally, we see that /3(G" — s) = /?(G') — 1 holds in G" := G' — {(s, u), (s, u)}-|- {(m, u)} 
for a pair of edges {(s, u), (s, u)} chosen in the above (1) and (2), since the edge (u, v) connects 
two distinct 5'-components for a disconnecting set S in G' that satisfies p(G' — S) = P{G'). □ 

Proof of Property 17: Every cut T E C{G' — S*) contains exactly one vertex in Fg' (s), and 
hence is a superleaf in G' from Lemma 4. Hence by (5.1), t(G') = P{G') = |C(G' — 5'*)| 
holds. Now from p(G' — 5”*) > £+1, cg'{X) > p{G' — S'*)|A| >{£+!) holds for every cut 
X C S*. If one of the conditions (1) - (4) holds, then we hook up at most one split edge and/or 
shift at most one edge incident to s, to obtain a multigraph G" in which we can apply Property 16 
to find A-splittable and K-splittable pair of two edges incident to s. 

(1) Let A be a A-critical cut with respect to v (if any). Now G' has a cut Ti 6 C(G' — S*) 
satisfying Ti n X p 0, since every cut YES* satisfies cg'{Y) > £ + 1. The graph G" := 
G' — e + e' resulting from shifting e to an edge e' = (s, vi) with vi E X C\Ti satisfies (3.1) and 

(3.2) by Lemma 7 and by (5.1), respectively. Now cg" (s, Ti) = 2 holds. 

(2) , (3) The graph G" := G' — e -F {(s, u), (s, u)} resulting from hooking up e also satisfies 
condition (3.3), because f(G') = P{G') >2£ + 2 and £> 4 hold. Now cg" (s, Ti) > 2 holds. 

(4) The graph G' — e -f- {(s, u), (s, u)} resulting from hooking up e also satisfies condition 

(3.3) , because f(G') = /?(G') > 2f + 2 and £> 4 hold. Then applying the edge-shifting in (1) to 
the graph G' — e -F {(s, u) , (s, u) }, we obtain a multigraph G" for which cg" (s,Ti) = 2 holds 
for some Ti e C(G" - 5**). 
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Finally if none of (1) - (4) holds, we show that G can be made {£, fc)-connected by ad- 
ding /?(G) — 1 new edges to G. In this case, every T e C(G' — S*) satisfies cc'{s, T) = 1, 
Eg'{s,S*) = 0 holds, and every edge e = (u,v) € F' satisfies {u,v} C V — S* and 
p((G' — e) — S*) = p{G' — S*) + 1, where F' is the set of all split edges (note F' C\F = 0). We 
easily see that k(G' + F 2 ) > k holds for any new edge set F 2 such that (N*,F 2 ) is a spanning 
tree for N* = Fa' (s). We choose an F 2 as follows. Since G' satisfies (3.1), by a complete edge- 
splitting at s in G' as described in Theorem 8, we can obtain G" = {V,F VJ F") (ignoring the 
isolated vertex s) with A(G") > 1. Note that in G" , every split edge e = (m, v) € F" satisfies 
{u, u} C If — S'* andp((G" — e) — S*) = p(G" — S*) + 1, because every newly split edge 
connects some F and Tj satisfying Ti,Tj 6 C(G' — S*) and F ^ F- Therefore p(G — S*) = 
p{{G" - F”) - S*) = p(G" - S*) + \F"\ holds. Let C(G" - S*) = {T[,F, ■ ■ • ,T^}, where 
b = piG” — S*). Then k(G" + F*) > k holds for F* = {(xi,Xi+i) | i = 1, • • • , b — 1}, 
where Xi is a vertex in T/ n N*, since {N*,F 2 = {F" — F') U F*) is a spanning tree. Note that 
\F"\ + |F*| = \F"\ + p{G" -S*)-l= p{G - S*) - 1 < /3(G) - 1 implies that F” U F* 
is an optimal solution since \F" U F* \ attains a lower bound /3(G) — 1. □ 

Proof of Property 18: Each iteration of procedure (A) decreases /3(G') at least by one from 
Property 16, and does not increase cg> (s), since hooking up does not increase P{G') and at most 
one edge is hooked up in each case. □ 

Next we prove Properties 19 - 21 for the correctness of Procedure (B). Now /3(G') < 
\t(G)/2\ holds. 

Proof of Property 19: Assume that the property does not hold. Let Ai denote the family of 
superleaves Q in G' satisfying Q n 5” = 0. Then, by Theorem 12 and Corollary 10, every Q € Ai 
satisfies Q € C(G' — S) satisfying cg'(s, Q) = 1 or Fg'{s) n D C A for some D e V{G') 
with D C Q and a maximal cut A G V with satisfying cg' {s, X) < f + 2 — (fc — 1) and 
containing xi € A and all vertices v € /g'(s) such that {(s,a;i), (s,u)} is not A-splittable. 
Therefore |Ai| <f + 3 — fc + p(G' — S) holds. 

If p(G' — S') > fc holds, then Lemma 5 implies that every superleaf Q in G' satisfies 
Q e Ai, and hence |Ai| = t{G') holds. This implies t(G') = |Ai| < £ A 3 — k + p{G' — S) 

< £ + 3 — k + /3(G'). >From this and /3(G') < |"t(G')/2], we have |"t(G')/2] < £ + 4 — k 
(fc > 4), contradicting f(G') > 2f + 2. 

Ifp(G'-S) < fc-l,thei^i| < £+3-k+p((F-S) < f-f^holds. Since |S| = fc-1 holds 
and every two superleaves in G' are pairwise disjoint, we have f(G') < |Ali| + fc — 1 <f + fc + l 

< 2f + 1, contradicting f(G') > 2f + 2. □ 

Proof of Property 20: We can see this property from Corollary lOand \{Q C If| Q is a superleaf 
in G' with Q n S 0}| > 2f — fc + 2. Here we omit the details. □ 

Proof of Property 21: Forthe resulting graph G", cg" (s) = cg' (s) — 2 clearly holds. Moreover, 
< t{G") — 1 follows from Lemma 6 and Theorem 12. □ 



6 Complexity 

First the family V{G) of all minimal tight sets and superleaves in G can be computed in 
0(min{fc — 1, y£n}mn) time by using the standard network flow technique n times [4]. If 
t(G) < 2f + 1, then Fq can be computed in 0(min{fc, yFi}kn^) time by applying Phase 5 of 
Jordan’s algorithm in [13], and Fq can be computed in 0((m+n log n)n log n) time, by using the 
algorithm AUGMENT in [19]. Ift(G) > 2£+2, Gi can be computed in 0(n^m+n® logn) time. 
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since the procedure is based on the algorithm ADD-EDGE* [1 1], which takes 0{n^m + n^ logn) 
time. 

In Step II, max{|r^(Q) | |Q is a superleaf in G'} = /?(G') if (3.3) holds, as mentioned in 
the proof of Property 16. So whether (3.3) holds or not can be checked and a disconnecting set 
S* with p(G' — S*) = I3{G') can be found if (3.3) holds, by computing max{|E^(Q)| \Q is 
a superleaf in G'}. In the last case of procedure (A), it is known in [19] that a complete splitting 
can be found in 0(n(m + n log n) log n) time. The remaining procedure of Step II is executed 
in 0(n"‘) time as follows. >From Property 18 (resp.. Property 21), the number of iterations of 
procedure (A) (resp., (B)) is at most n by /3(G) < n (resp., f(G) < n). In each iteration, we 
can find a K-splittable pair in 0(min{fc — 1 , y^}m) time by using the standard network flow 
technique [4]. For the vertex xi € Fg' (s), a maximal dangerous cut A C with {xi,y~\ C X 
can be found in 0{n^) time by computing a maximum flow between two vertices xi and s in 
G' — {(s, xi), (s, y)} + {xi,y) [16] (if any). Hence we can find a A-splittable pair in 0(n^) time. 
Note that, in procedure (A), a A-critical cut A C H can be found in 0(n(m + n log n)) time by 
using the algorithm AUGMENT in [19]. 

Summarizing the argument given so far. Theorem 14 is now established. □ 



7 Concluding Remarks 

In this paper, we gave a polynomial time algorithm for augmenting a given (fc — l)-vertex- 
connected multigraph G, fc > 4, to an f-edge-connected and fc-vertex-connected graph by adding 
at most 2f surplus edges over the optimum. However, if f = fc > 4, there is an algorithm [13,14] 
that produces at most (fc — 2) /2 surplus edges over the optimum. Therefore, it is a future work to 
close the gap between this and our bound. 
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Abstract. We consider the following map labelling problem: given distinct points 
pi,P 2 , ■ ■ ■ ,Pn in the plane, find a set of pairwise disjoint axis-parallel squares 
Qi,Q2, • • • , Qn where pi is a corner of Qi. This problem reduces to that of 
finding a maximum independent set in a graph. 

We present a branch and cut algorithm for finding maximum independent sets and 
apply it to independent set instances arising from map labelling. The algorithm 
uses a new technique for setting variables in the branch and bound tree that impli- 
citly exploits the Euclidean nature of the independent set problems arising from 
map labelling. Computational experiments show that this technique contributes to 
controlling the size of the branch and bound tree. We also present a novel variant of 
the algorithm for generating violated odd-hole inequalities. Using our algorithm 
we can find provably optimal solutions for map labelling instances with up to 950 
cities within modest computing time, a considerable improvement over the results 
reported on in the literature. 



1 Introduction 

When designing maps an important question is how to place the names of the cities on 
the map such that each city name appears close to the corresponding city, and such that 
no two names overlap. Various problems related to this question are referred to as map 
labelling problems. 

A basic map labelling problem is described as follows: given a set P = {p\,p 2 , • • • , 
Pn} of n distinct points in IR^, determine the supremum a* of all reals a, for which 
there are n pairwise disjoint, axis-parallel a x a squares Qi, Q2, • • • , Qn, where pi is 
a comer of Qi for alH = 1, . . . , n. By “pairwise disjoint squares” we mean that no 
overlap between any two squares is allowed. Once the squares are known they define 
the boundaries of the area where the labels can be placed. The decision variant (DP) of 
this problem is for fixed a to decide whether there exists a set of squares Qi, , Qn as 
described above. Formann and Wagner [13] showed that problem DP is A/^P-complete. 
Kucera et al. [18] observed that there are only 0{p?) possible values that a* can take. 
Optimising over those can be done by solving only 0(log n) problems DP with different 
a using binary search. Moreover, they present an algorithm that solves the map labelling 
problem DP. 

* This research was (partially) supported by ESPRIT Long Term Researeh Project 20244 (project 
ALCOM IT: Algorithms and Complexity in Information Technology). 
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We study the following generalisation (ODP) of problem DP: given a, find as many 
pairwise disjoint squares as possible. Clearly, a solution to problem DP exists if and 
only if a solution to problem ODP exists in which n squares are found. An advantage 
of studying ODP instead of DP is that a feasible solution to ODP actually represents 
a partial map labelling. Our experience shows that it is relatively easy to find feasible 
solutions to ODP of good quality. Since DP is A/^P-complete, problem ODP is A/^P-hard, 
and therefore we do not expect that there exists a polynomial time algorithm for solving 
ODP. So, we have to resort to enumerative methods such as branch and bound if we want 
to solve ODP to optimality, unless V = MV. Formann and Wagner [13] developed a 
I -approximation algorithm for ODP. Different heuristic algorithms (including simulated 
annealing) are discussed by Christensen et al. [8] . Van Dijk et al. [ 1 0] considered genetic 
algorithms, Wagner and Wolff [26] propose a hybrid heuristic. Cromly [9] proposed a 
semi-automatic LP based approach for finding feasible solutions to ODP. Zoraster [29,30] 
used Lagrangean relaxation to make a heuristic algorithm for ODP. We formulate ODP 
as an independent set problem, see Section 1 . 1 and develop a branch and cut algorithm 
for solving ODP to optimality. Branch and cut is a branch and bound algorithm in which 
a so-called cutting plane algorithm may be called in every node of the branch and bound 
tree. For readers unfamiliar with branch and cut algorithms, we give a brief description 
in Section 2.1. 

1.1 The Independent Set Formulation 

An independent set S' in a graph G is a set of nodes such that no two nodes in S are 
adjacent in G. Problem ODP can be formulated as a maximum cardinality independent 
set problem on the conflict graph Gp^a- = {Vp, associated with the map labelling 
problem: for each Qi, i = 1, . . . , n, we add four nodes to Vp, corresponding to the 
possible placements of square Qi. We add edge {tt, u} to Ep^a if u and v correspond to 
placements of Qi and Qj for some i j with Qi n Qj 0, or to distinct placements of 
Qi for some i. Using this construction, any valid placement of the squares Q\, . . . , Qn 
corresponds to an independent set in Gp^a of size n, and vice versa. This relation to 
the independent set problem has been used to derive polynomial time approximation 
algorithms [ 1 ,4] for problem ODP. Kakoulis and Tollis [17] show how to approach more 
general map labelling problems than problem ODP using essentially an independent set 
formulation. 

The complement of a graph G = (V,E) has the same node set as G and is denoted 
by G = ( U, E), where {u, v} <E E if and only if {u, v} E. A set of nodes S' C U is 
a maximum independent set in G if and only if S is a maximum clique in G. Several 
optimisation algorithms for finding maximum independent sets [24,25,20,22,28,14] and 
maximum cliques [5,2, 3,7,6] have been reported on in the literature. The independent 
set formulation and the clique formulation are equally suitable for algorithmic design. 
In our study we use the independent set formulation. 

Only labels that are positioned closely together intersect each other, which means 
that the independent set instances that arise from map labelling are very sparse. Altough 
sparse graphs are notoriously hard for the independent set codes reported on in the 
literature (sparse problems with only 200 nodes are considered hard), the graphs arising 
from map labelling are different because they have a nice topological structure, namely. 
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the conflict graph can easily be embedded in the Euclidean plane such that the edges 
of the graph only connect nodes that are close together (e.g., by placing a node in the 
center of the square it represents). 

1.2 Contribution and Outline 

Our main contribution is that we demonstrate that it is possible to solve map labelling 
instances with up to 950 cities (or 3800 nodes) quickly using our branch and cut algorithm 
for the independent set problem. The optimisation algorithms reported on so far are 
useless for map labelling instances with more than 50 cities [8, page 219]. We show that 
it is possible to enhance the standard branch and cut algorithm with a recursive technique 
that is applicable for the Euclidean instances arising from map labelling. Finally, we 
present a novel variant of lifting odd hole inequalities based on path decomposition. 
In Section 2 we describe our branch and cut algorithm. Computational experiments on 
standard test problems from the map labelling literature are reported on in Section 3 . 



2 An Algorithm for Maximum Independent Sets 

We begin by giving a brief description of a branch and cut algorithm in Section 2.1 
directed to readers less familiar with this topic. In Section 2.2 we specify the details of 
our implementation of the branch and cut algorithm. This includes how we set values 
of variables in the branch and cut algorithm in order to speed up the algorithm, and 
a new technique that makes use of the possibility to decompose a problem in a node 
of the search tree yielding small easy-to-solve integer programs that we can solve to 
optimality. This enhancement proved very useful for the map labelling instances as they, 
due to their sparsity, quite often give rise to decomposable subproblems once a number 
of variables have been set. We conclude this section by giving the separation algorithms 
for two families of inequalities, namely clique and odd hole inequalities. In Section 2.3 
we describe a local search algorithm for Ending good feasible solutions, and thereby 
lower bounds on the optimal value of our problem. In this section we use n = | and 

m = |i?| to denote the number of vertices and edges, respectively, of the graph used in 
the independent set formulation of ODP. 

2.1 Branch and Cnt 

A branch and cut algorithm is a branch and bound algorithm where we may call a cutting 
plane algorithm in each node of the search tree. Here we give a short description of a 
basic version of branch and bound and of a cutting plane algorithm. For further details 
we refer to the book by Wolsey [27]. 

Consider the problem to determine 2:opt = max{ 2 ;(a:) : x <E P,x integer}, where 
2 is a linear function in x, and where P is a polyhedron. We refer to this problem as 
problem U. Branch and bound makes use of the linear programming relaxation (LP- 
relaxation) IJ: z = max{z{x) : x G P}. It is easy to see that z > zqpt- At the top 
level, or root node, of the branch and bound tree we have problem IJ. At level k of 
the tree we have a collection of problems, say IJi, . . . , IJi such that the corresponding 
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polyhedra Pi, Pi are pairwise disjoint, and such that all integral vectors in P are 
contained in Pi U • • • U P; . 

The algorithm works as follows. We maintain a set of open problems, the best known 
value of an integer solution z* = z{x*), and the corresponding integer solution a:*. At 
first, problem 77 is the only open problem. In iteration i, we select an open problem 77* 
and solve it. If problem 77* is infeasible we remove 77* from the list of open problems 
and continue with the next iteration. If the optimal solution to 77*, S*, is integral, i.e., 
a:* is a solution to problem 77, we set x* := a:* if z{lP) > z* , we remove 77* from 
the list of open problems and we proceed to the next iteration. Otherwise, we identify a 
component j of the vector a:* such that x* is not integral, and “branch” on Xj, i.e., we 
formulate two new open problems, say 111 W adding constraints Xj < [x* J and 

Xj > [Xj] to P*. If Xj is a 0-1 variable we add constraints Xj = 0 and Xj = 1. Note 
that a:* neither belongs to P{ nor to PJ. The value of z{x'‘) is an upper bound on the 
value of any solution to the problems 111 and ll^- We replace 77* by 77| and 771 in the 
set of open problems and proceed to the next iteration. The algorithm stops after the set 
of open problems becomes empty. 

When using branch and bound to solve integer programming problems it is crucial 
that we obtain good lower and upper bounds on the optimal value as the bounds are 
used to prune the search tree. In order to obtain good upper bounds we strengthen the 
LP-relaxation by adding valid inequalities. Let X = P C\ ZZ^, i.e., X is the set of 
feasible solutions to 77, and let conv(X) denote the convex hull of feasible solutions. 
Observe that zqpt = raax{z{x) : x <E conv(X)}. An inequality rr^a: < tto is valid 
for conv(X) if vr^a: < tto for all x € conv(X) . For any valid inequality ■n'^x < tto, the 
set \x € conv(X) : rr^a: = ttq} is called a faee of conv(X) if it is nonempty and not 
equal to conv(X). A face is called a faeet of conv(A) if it is not contained in any other 
face of conv(X). The facets of conv(X) are precisely the inequalities that are necessary 
in the description of conv(X). If the problem of optimising over X is NV-h&rd, then 
we cannot expect to find an explicit description of conv(X) unless NV = co-NV. 
In practice we therefore limit ourselves to certain families of facet-defining, or at least 
high-dimensional, valid inequalities. Given a family T of valid inequalities and a vector 
X, the problem of determining whether there exists an inequality belonging to T that 
is violated by x is called the separation problem based on T . Even if it is TVT^-hard to 
optimise over X, the separation problem based on a specific family of valid inequalities 
for conv(A) might be polynomially solvable. 

In a eutting plane algorithm we maintain an LP-relaxation of a problem 77. We start 
with the formulation P. Solve 77 given P. If the optimal solution x is integral, then 
we stop. Otherwise, we call the separation algorithms based on all families of valid 
inequalities that we consider. If any violated inequalities are identified we add them to 
P. If no violated inequalities are found we stop. 



2.2 Branch and Cnt for the Independent Set Formulation 

Here we describe the parts of the branch and cut algorithm that are specific for our 
problem and for our implementation. We use the following notation. For any set S 
and finite discrete set 7, we use x ^ to denote a vector x of dimension |7| whose 
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components are indexed by the elements of I. For any I' Q I we use the implicit sum 
notation x{I') to denote Xi. 

Given a graph G = {V, E), the set of all incidence veetors of independent sets in G 
is given by 

Xis = {x e {0, 1}'^ I V{u, w} e E ■. x{{v, zu}) < 1}. 

We consider the following integer programming version of the maximum cardinality 
independent set formulation. 

max{z(a:) = x{V) : x e Xis}. 

In a certain node i of the braneh and cut tree, all the variables that we have been 
branching on in order to create node i have been set to either zero or one. In order to 
trigger pruning of the seareh tree, either through integrality or through infeasibility, it is 
important to try to set other variables equal to zero or one as well. We have implemented 
three variable setting algorithms: setting by reduced costs, by logical implications, and 
the new seheme variable setting by recursion and substitution. We will return to these 
algorithms later in this seetion. 

We use two families of facet-defining valid inequalities in the cutting plane algorithm: 
clique inequalities and lifted odd hole inequalities . Let G C V he a subset of the nodes 
that induees a clique, i.e., a eomplete subgraph, in G. Padberg [23] showed that the elique 
inequality x{G) < 1 is valid for conv(Wis), and that it defines a facet of conv(Wis) if 
and only if G is maximal. We call an odd length chordless eyele in G an odd hole of 
G. Let H Q V he a subset of the nodes that induces an odd hole in G. It was shown 
by Padberg that the odd hole inequality x{H) < [|iL|/2j is valid for conv(Xis) and 
that it defines a facet for conv(Xis) fl {a: € IR'^ | = 0 for all u € V \ H}. Henee, 

in general odd hole inequalities do not define facets of conv(Xis). Padberg suggested a 
way of inereasing the dimension of the faces indueed by the odd hole inequalities, called 
lifting. The separation algorithms based on the two families of inequalities together with 
our variant of Padberg’s lifting seheme for odd hole inequalities are deseribed at the end 
of this seetion. 

In our braneh and cut algorithm we also incorporated a local search algorithm for 
finding feasible solutions of good quality, and thereby good lower bounds on the optimal 
value. This local search algorithm is described in Section 2.3. 

At the initialisation of the branch and cut algorithm we consider a stronger linear 
relaxation than the linear relaxation obtained from Wis. We observe that every pair of 
vertices connected by an edge in the conflict graph is a clique on two nodes. This clique is, 
however, in general not maximal. Therefore we substitute each constraint a:({u,u}) < 1 
for all {u, u} G Ehy a constraint based on a maximal clique containing the edge {u, u}. 
Let C be a collection of maximal cliques in G such that each edge of G is contained in 
at least one clique in C. Define Pc — {x a 1R'^ : x{G) < 1 for all C £ C,x > 0}, 
and let lie be defined as max{ 2 ;(a:) : x £ Pc, x integer}. As in Section 2.1 we define 
the LP -relaxation of problem lie to be the problem He'. max{z(a:) : x G Pc}- As 
mentioned in Section 2. 1 the branch and cut algorithm maintains a collection Q of open 
problems. We initialise Q := {Pc}- 
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The only algorithmic details that remain to be defined are how to select a branching 
variable and how to select the next open problem. We select a variable to branch on based 
on pseudo costs as suggested by Linderoth and Savelsbergh [19]. The open problem W 
to remove from Q is chosen using the best value node selection rule. Furthermore, we 
give precedence to node i if we found a better heuristic solution in node p{i). 



Variable Setting and SRS. Let (a:*, tt) be an optimal primal-dual pair for an LP re- 
laxation in node i and let denote the reduced cost vector. If for some v £ V we 
have xl = 0 (or = 1) and < z{x*) — z{x'^) (or > —(z(x*) — z(x"‘))), we 
can set := 0 (or x{ := 1, respectively) in all nodes j in the part of the branch and 
bound tree rooted at node i. We denote the neighbourhood of a node n in G by N{v), 
i.e., N{v) = {u e V\[u,v} G E}. If, in a node j of the branch and bound tree, we 
have that x{, is set to 1 for some v (either by branching or by reduced cost), we can set 
xh^ := 0 for all u G N(v). If on the other hand, there exists v such that x^^ is set to 0 for 
all u G N{v), we can set xl := 1. 

Observe that deciding whether to set a variable based on reduced cost depends on 
the gap z(x) — z(x*). This gap decreases every time we find a new x* . We try to set 
variables in every node of the branch and bound tree whenever this occurs. This is done 
in a “lazy” fashion. For this purpose, we store a tree T that mirrors the branch and 
bound tree. For each node in the branch and bound tree, we store the variables that we 
can set in its corresponding node in T, together with the value z(x*) for which we last 
applied the variable setting procedure in that node. Suppose we decide to solve problem 
77* for some node i G T. For each node j on the path from the root of T to node i, 
if z(x*) has improved since the last time we applied variable setting in node j, then 
we reconstruct the final LP reduced cost of node j and re-apply the variable setting 
procedure. Reconstructing the final LP reduced cost can be done using the final LP basis 
of node j. Storing a basis in each node of T would require 0{n + m) memory per node, 
which may become prohibiting if the tree size becomes large. In each node j, however, 
it suffices to store the difference between its final basis and the final basis of its parent 
node p(j ) . 

Another, even more important enhancement that we developed is referred to as va- 
riable setting by reeursion and substitution, or SRS. We call a variable that is not set to 
0 or 1 free. Focus on node i after the cutting plane algorithm is done. Denote the subset 
of nodes in G corresponding to free variables in node i of the branch and bound tree 
by F*. We check whether the graph G(F^) = (F^ ,E(F^)) is disconnected. If this is 
the case, denote the connected components of G(F^) by G(Fl ), . . . , G{Fl) such that 
idi < \Fjj^i I for j = 1, ... ,k — 1. We identify the largest component, G(Ff), as the 
main component of the problem in node i, which we leave aside for the moment. Our 
goal is to exploit the fact that we can find maximum independent sets on the smaller 
eomponents easily. Denote a:* restricted to FL C V by x^ , i.e. x^^ = if u G W 
and x^ = 0 if u ^ W. For all j = 1, . . . , fc — 1, if x^o is integer we set x^^ := x^i . 
Otherwise we recursively find a maximum independent set with incidence vector x^^ 
on G(Fj). Furthermore, we substitute these partial solutions back into a:* to give us 




432 



B. Verweij and K. Aardal 



as follows: 

A: — 1 A: — 1 

:= x^i . 

j=i j=i 

Proposition 1. There exists an optimal solution x to iT* with Xy = x\ for all v € 

For the proof, see the full version of the paper. It follows that we can set a:* : = , thereby 

tightening z{x'^). Finally, for any node j descendant of node i, we set the variables Xy 
for V e Uj=i Fj to xl. 

Separation and Lifting. For the separation of clique inequalities we do follow the ideas 
of Nemhauser and Sigismondi [22]. We refer to the full paper for the teehnical details 
regarding our precise implementation. 

Let X denote a fractional solution. We start by describing the separation algorithm 
for the basic odd hole inequalities x{H) < [|iJ |/2j given the vector x. We first find an 
odd cyele starting from some node v £ V using the eonstruction described by Grotschel, 
Lovasz, and Schrijver [15]. To find a shortest odd cycle containing node v, Grotschel, 
Lovasz, and Schrijver construet an auxiliary bipartite graph G = {{V^ ,V‘^), E) and 
cost vectors c € [0, 1]^ and c € [0, 1]'® as follows. Each node u € G is split into two 
nodes and u* is included in G* (i = 1, 2). For eaeh edge {u, u} G E, we add the 
edges and to E, and set c^y^y^ = = 1 — Xy — Xy. 

Observe that a path from G to G in G corresponds to a walk of odd length 
in G from u to v. 




Fig. 1. Identifying an odd hole in a elosed walk. 



A shortest path from to in G corresponds to a shortest odd length closed walk 
in G containing v. The reason that we are looking for a shortest path is that a subset of 
nodes H that induees an odd hole that corresponds to a violated odd hole inequality will 
have c{E{H)) < 1, and a short closed walk in G is more likely to lead to a violated 
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lifted odd hole inequality than a long one. Shortest paths in a graph with non-negative 
edge lengths ean be found using Dijkstra’s algorithm [1 1]. Hence, we can find a closed 
walk 

C ■,= {v = no, ei, 1 ) 1 , 62 , 1 ) 2 , • • . ,nfe_i,efe,i)fe = v) 

in G with odd k that is minimal with respect to c and jC | by using Dijkstra’s algorithm to 
find a shortest path (with respect to c) of minimal cardinality from i)^ to i)^ in G. Some of 
the Vi may occur more than once, and the walk may have chords. Let j € 2, . . . , fc — 1 be 
the smallest index such that there exists an edge {vi, Vj} & E for some i € 0, . . . , j — 2 
(such i,j exists because {dq, v^-i } € E). Let i e 0, . . . , j — 2 be the largest index such 
that {vi, Vj} G E. Let H := {i)i, ni+i, . . . ,Vj}. We claim that H induces an odd hole 
in G (see Figure 1). Clearly H induces a cycle. If |iL | = 3 then H is a clique and we 
ignore it. Otherwise, by choice of i and j, H does not contain chords. Now suppose, 
seeking contradiction, that \H\ = j — i + 1 is even. Then, 

G . — (l) — 1)q, 6 i , . . . , Ui — 1 , 6 i , 1 )i , (i)i , 1)j j, 1)j , 6j-|-1 , 1)j-|-1 , . . . , , 1)/2 — 1 )) 

is an odd length closed walk in G containing v. Moreover, c({ei+i , . . . , Cj }) = {j — i) — 
(2 Xvj,) - Xy. - Xy. . It follows from x{{vp, i)p+i}) < 1 that < 

(j — 1 — l)/2. Therefore, 

c({6i + l, . . . , Cj}) > {j -i-1) - Xy, - Xy. = C{y.^y.}, 

so G' is not longer than G with respect to c. However, C" is of smaller cardinality, which 
contradicts to our choice of G. Hence H induces an odd hole in G. 

Given an odd hole H, a lifted odd hole inequality is of the form 

x{H) + Y, LI^I/2J 

v€V\H 

for some suitable vector a. G Assume without loss of generality, that the nodes in 
V \ H are indexed as {ui, 1 ) 2 , • • • , Padberg [23] has shown that a lifted odd 

hole induces a facet if we choose 

ay. = [|iT|/2j - max{a:(iT) + Y.]=\ <^y^Xy. : x G X|g}, 

where X|g denotes all incidence vectors of independent sets on G restricted to iT U 
{i)i, . . . , \ N{vi). Nemhauser and Sigismondi [22] observed that o;„ = 0 for 

1 ) G V\Hif\N{v)r\H\ < 2. This implies that the independent set problems that have 
to be solved in order to compute the lifting coefficients a. are relatively small. In our 
computational experiments on map labelling problems we have observed independent 
set problems with up to approximately 40 nodes. We lift the variables in non-decreasing 
lexicographic order of (|| — Xy\, — |A'(i)) fl iT|), where ties are broken at random. 

Moreover, we can use the fact that a hole has small path width [12]. Therefore we 
can build an initial path decomposition of the hole that we can extend in a straightfor- 
ward greedy fashion each time we find a variable with a nonzero lifting coefficient. In 
our experiments on map labelling problems, it rarely occurred that the path width of 
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the resulting path deeomposition exeeeded 20. This enables us to compute the lifting 
coefficients very efficiently. The advantage of using a path decomposition algorithm is 
that the size of a maximal independent set can be found using the path decomposition 
in time that is exponential only in the path width, not in the size of the graph. 

2.3 A Local Search Heuristic for Independent Sets 

We proceed by describing the primal heuristics we use to improve x* . First we apply a 
simple rounding heuristic that gives a feasible solution. We then use this solution as a 
starting point for a combination of local search heuristics. 

Suppose we have a fractional solution to problem 77 and we want to construct 
an integer solution to problem 77. The rounding procedure works as follows. For 
each u € y, if a:„ > I we set := 1, otherwise we set := 0. The feasibility of 
follows from the observation that > I implies x^f < | for all u € N{v), which 
together gives a:^({M, v}) < 1 for each {u, u} G E. 

After the rounding procedure, we first apply a 1-opt procedure. We start by taking 

a;l-opt := a;^. For each v V, we check if all u <E N{v) have Xu = 0 and if so, 

we set Xv := 1. Clearly, is a feasible solution if a:^ is a feasible solution. 

The combined time complexity of the rounding and the 1-opt procedures is 0(n). 

Finally, we apply a 2-opt procedure. We start by taking := a:^'®P^ For each 

V € C, we check whether there exists Ujiu € A"(u) such that i := a;^'°P^ — e„-fe„ + eu, 
is feasible. As soon as we find such u, w, we replace a;^“®P^ by x. We continue until no 
such u, V, w exists. The 2-opt algorithm can be implemented to work in 0((z(a;^'®P^) — 
2 ;(a:^'®P^) + l)n^) time. 

When applying the above heuristics from a node in the branch and bound tree, we 
make sure that all variables that are set by branching, reduced cost, logical implication, 
and SRS remain at their value in x^^. 

3 Computational Results 

In order to evaluate the behaviour of our algorithm on independent set problems that 
come from map labelling, we implemented our algorithm in C+-I-, using our own 
framework for solving MIP’s based on the CPLEX 6.0.1 linear program solver [16]. 
We use the LEDA graph data structure [21] to represent graphs. We tested our algo- 
rithm on the same class of map labelling problems as used by Christensen [8] and van 
Dijk [10]. These problems are generated by placing n (integer) points on a standard 
map of size 792 by 612. The points have to be labelled using labels of size 30 x 7. For 
each n G {100, 150, . . . , 750, 800} we randomly generated 50 maps. Figure 2 shows 
the average branch and bound tree sizes and running times of our algorithm on these 
problem instances. The reported running times were observed on a 168 MHz Sun Ultra 
Enterprise 2. 

To evaluate the infiuence of SRS on the performance of our algorithm, we conducted 
a second set of experiments. In these experiments, the density of the problems was kept 
the same by making the map size a function of the number of cities. For a problem 
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Average Branch and Bound Tree Size Average CPU Time (s) 





Number of Cities Number of Cities 

Fig. 2. Performance of our algorithm on random map labelling problems. 



with n cities, we use a map of size [792/yA50/nJ x [612/ ^750/nJ. For each n e 
{600, 650, . . . , 900, 950} we randomly generated 50 maps. ^From each generated map, 
we selected its largest connected component and used that as the input for our algorithm 
hoth with and without SRS. Figure 3 shows the average branch and bound tree sizes and 
running times for these experiments. The reported branch and bound tree sizes for the 
case with SRS includes the nodes in branch and bound trees of recursive calls. 



Average Branch and Bound Tree Size Average CPU Time (s) 




initial Number of Cities in Sample 



Initial Number of Cities in Sample 



Fig. 3. The impact of SRS. 



The experiments show that map labelling instances with up to 950 cities can be 
solved to optimality using our branch and cut algorithm with reasonable computational 
effort. It turns out to be crucial to exploit the sparsity of the resulting independent set 
problems, and the new variable setting procedure, SRS, is very efficient in doing so. 

Christensen [8] reports on experiments with different heuristic algorithms on problem 
instances with up to 1 500 cities. The running times of different heuristics as mentioned by 
Christensen fall in the range of tenth of seconds (for random solutions) to approximately 
1000 seconds (for the heuristic by Zoraster [30]). We have observed in our experiments 
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that local search, starting from fractional LP solutions in the branch and bound tree, finds 
an optimal solution already after a couple of minutes of computing time. Moreover, due 
to the strong relaxation resulting from the maximal clique and lifted odd hole inequalities, 
we can prove our solutions to be at most 1 or 2 labels away from optimal at an early stage 
of the computation. Most of the time is spent by our algorithm in decreasing the upper 
bound to prove optimality. We conclude that we can turn our branch and cut algorithm 
into a very robust heuristic for map labelling by adding heuristic pruning of the search 
tree. 
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Abstract. Let G? be a bipartite graph with positive integer weights on the edges and 
without isolated nodes. Let n and W be the node count and the total weight of G. 
We present a new decomposition theorem for maximum weight bipartite matchings 
and use it to design an O {y^W)-time algorithm for computing a maximum weight 
matching of G. This algorithm bridges a long-standing gap between the best known 
time complexity of computing a maximum weight matching and that of computing 
a maximum cardinality matching. Given G and a maximum weight matching of G, 
we can further compute the weight of a maximum weight matching of G — {u} 
for all nodes u in 0{W) time. As immediate applications of these algorithms, 
the best known time complexity of computing a maximum agreement subtree of 
two f-leaf rooted or unrooted evolutionary trees is reduced from ® logf) to 



1 Introduction 



Let G = (X,Y,E) be a bipartite graph with positive integer weights on the edges. 
A matching of G is a subset of node-disjoint edges of G. Let mwm(G) (respectively, 
mm(G)) denote the maximum weight (respectively, cardinality) of any matching of G. 
A maximum weight matching is one whose weight is mwm(G). Let N be the largest 
weight of any edge. Let W be the total weight of G. Let n and m be the numbers of 
nodes and edges of G\ to avoid triviality, we maintain m = Q{n) throughout the paper. 

The problem of finding a maximum weight matching of a given G has a rich history. 
The first known polynomial-time algorithm is the 0(n^)-time Hungarian method [23]. 
Fredman and Tarjan [12] used F ibonacci heaps to improve the time to O (n (m+ n log n) ) . 
Gabow [13] introduced scaling to solve the problem in log N) time by taking 

advantage of the integrality of edge weights. Gabow and Tarjan [14] improved the scaling 
method to further reduce the time to 0{^/nm log (niV)). For the case where the edges all 
have weight 1, i.e.. A" = landFF = m, HopcroftandKarp [18] gave an O(yT)FF) - time 
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algorithm.* It has remained open since [14] whether the gap between the running times 
of the latter two algorithms can be closed for the case IV = o(m log (niV) ) . 

This paper resolves the open problem in the affirmative by giving an 0(^/nW)-time 
algorithm for general W. The algorithm does not use scaling but instead employs a novel 
decomposition theorem for weighted bipartite matchings. We also use the theorem to 
solve the all-cavity maximum weight matching problem which, given G and a maximum 
weight matching of G, asks for mwm(G — {u}) for all nodes u in G. The case where 
N = 1 has been studied by Chung [4]. Recently, Kao, Lam, Sung, and Ting [21] gave 
an 0{^/nm log A^)-time algorithm for general N. This paper presents a new algorithm 
that runs in 0(W) time. 

As immediate applications, we use the new matching algorithms to speed up two of 
the best known algorithms for comparing evolutionary trees, which are trees with leaves 
labeled by distinct species [1,2,17,26]. Different models of the evolutionary relationship 
of the same species may result in different evolutionary trees. An agreement subtree 
of two evolutionary trees is an evolutionary tree which is also a topological subtree 
of the two given trees. A maximum agreement subtree is one with the largest possible 
number of leaves. A basic problem in computational biology is to extract the maximum 
amount of evolutionary information shared by two given models of evolution. To a useful 
extent, this problem can be solved by computing a maximum agreement subtree of the 
corresponding evolutionary trees [11]. 

Algorithms for computing a maximum agreement subtree for unrooted or rooted 
evolutionary trees have been studied intensively in the past few years. The unrooted case 
is technically more difficult than the rooted case. Steel and Wamow [25] gave the first 
polynomial-time algorithm for two unrooted trees with no restriction on the degrees. 
Let £ be the number of leaves in the input trees. Their algorithm runs in ® logf) 
time. Farach and Thorup [8] reduced the time to for unrooted trees [8] and 

0(f * ® log f ) for rooted trees [9]. For the unrooted case, the time complexity was impro- 
ved by Lam, Sung, and Ting [24] to and later by Kao, Lam, Przytycka, 

Sung, and Ting [20,22] to ® logf). Faster algorithms for trees with degrees boun- 
ded by a constant have also been discovered [5,7,19,20,22]. In this paper, we use the 
new algorithm for computing a single maximum weight matching to reduce the time 
for general rooted trees to We use both new matching algorithms to reduce the 

time for general unrooted trees to time as well. 

Section 2 presents the decomposition theorem and uses it to compute the weight 
of a maximum weight matching. Section 3 gives an algorithm to construct a maximum 
weight matching. Section 4 solves the all-cavity matching problem. Section 5 applies 
the matching algorithms to maximum agreement subtrees. 



* For A = l,Feder and Motwani] 10] gave another matching algorithm which is more efficient for 
dense graphs; the time complexity is 0(YTtlK/(logn/ log(n^/m)), which remains 0{yViW) 
whenever the graph has at most edges for any e > 0. 
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2 The Decomposition Theorem 

In §2.1, we state the decomposition theorem and use it to compute the weight mwm(G) 
in 0{^/nW) time. In §2.2 and §2.3, we prove the theorem. In §3, we further construct a 
maximum weight matching itself within the same time bound. 

2.1 An Algorithm for Computing mwm(G) 

Let u) denote the weightofan edge uu e G;ifuisnotadjacenttou,letru(M, u) = 0. 

A cover of G is a funetion C ■. XijY — ^ {0, 1, 2, . . .} such that G(a;) +G(y) > w{x, y ) 
for all a; € X and y € Y.C is a minimum weight cover if J2zexuY smallest 

possible. A minimum weight eover is a dual of a maximum weight matehing as stated 
in the next fact. 

Fact 1 (see [3]) Let C be a cover and M be a matching of G. The following statements 
are equivalent. 

1. C is a minimum weight cover and M is a maximum weight matching of G. 

J2uv<aM C{u). 

3. Every node in {u \ G (u) > 0} w matched by some edge in M, and G (u) + G(v) = 
w(u, v) for all uv € M. 

For an integer he [1, iV], we divide G into two lighter bipartite graphs Gh and G^ 
as follows. Note that the total weight of Gt and G^ is at most W. 

• Gh is formed by the edges uv of G with w{u, u) G [A" — L + 1, N], Each edge uv 
in Gh has weight w{u, v) — {N — h). For example, G\ is formed by the heaviest 
edges of G, and the weight of each edge is exactly one. 

• Let Gh be a minimum weight cover of Gh- G^ is formed by the edges uv of G with 
w{u, v) — Gh{u) — Gh{v) > 0. The weight of uv is w{u, v) — Gh{u) — Gh{v). 

The next theorem is the deeomposition theorem. 

Theorem 1. mwm(G) = mwm(G/i)+mwm(G)^);/?arricM/ar/y, mwm(G) = mm(Gi)+ 
mwm(Gf^). 

Proof. See §2.3. 

Theorem 1 suggests the following recursive algorithm to compute mwm(G). 
Procedure Compute-MWM(G) 

1 . Construct G\ from G. 

2. Compute mm(Gi) and find a minimum weight cover G\ of Gi. 

3. Compute Gf- from G and Gi. 

4. If Gf' is empty, then return mm(Gi); otherwise, return 
mm ( Gi ) +Compute-M WM (Gf). 

Lemma 2. Compute-MWM(G) correctly finds mwm(G) in 0{^/nW) time. 
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Proof. The correctness of Compute-MWM follows from Theorem 1 . Below we analyze 
its running time. We initialize a maximum heap [6] in 0{m) time to store the edges 
of G according to their weights. Let T(n,W) be the running time of Compute-MWM 
excluding this initialization. Let L be the set of the heaviest edges in G. Then, Step 
1 takes 0(|L| logm) time. In Step 2, a maximum cardinality matching of Gi can be 
found in 0{^/n\L\) time [10,18]. From this matching, Gi can be found in 0(|L|) time 
[3]. Let L' be the set of the edges of G adjacent to some node u with C'i(u) > 0; 
i.e., L' consists of the edges of G whose weights are reduced in Gf. Step 3 updates 
every edge of L' in the heap in 0(|L^| logm) time. Since the total weight of Gf is at 
most W — \L'\, Step 4 requires at most T(n, W — \L'\) time. In summary, as L C V , 
T(n, FL) < 0{^/n\L'\) + T{n,W — \L'\) = 0{^/nW). Thus, the running time of 
Compute-MWM is T(n, FL) + 0(m) = 0{^/nW) as stated. 

2.2 Unfolded Graphs 

The proof of Theorem 1 makes use of the unfolded graph 4>{G) of G defined as follows. 

• For each node u of G, (j){G) has a images of u, denoted asv},v?,..., u“, where a 
is the weight of the heaviest edge incident to u. 

• For each edge uv of G, (j){G) has the edges . . . , where j3 = 

w{u, v). 

The next lemma relates G and 4>{G). Let M be a matching of G. Then, 4>{M) = 
• • • ’ I = w{u, u)}. A path in G is alternating for M if (1) its 
edges alternate between being in M and being not and (2) in case the first (respectively, 
last) edge of the path is not in M, the first (respectively, last) node of the path is not 
matched by M. 

Lemma 3. IfM is a maximum weight matehing of G, then 4>{M) is a maximum cardi- 
nality matching of 4>{G). Consequently, mwm(G) = mm(()>(G)). 

Proof. Since <^(M) is a matching of <^(G) and mwm(G) = ~ l'A(-^)l’ 

it suffices to prove |<^(M)| = mm(()i(G)). From basicsofbipartite matchings [15, 16], to 
prove this by contradiction, we may assume that 4>{G) has an alternating path of ,^2 , ... , 
of for 4>{M) such that p is odd and the first and the last edge of the path are not in 
(j){M). Let P be the corresponding path oi, U 2 , . . . , Up in G. The net ehange of a path 
in G is the total weight of its edges in M minus that of its edges not in M. To contradict 
the maximality of M, it suffices to use P to construct an alternating path Q of G for M 
with a positive net change. 

Note that the edges of P alternate between being in M and being not and that 
aid 2 ,ap-iap f M. Since w{ak,ak+i) = ik + *fe+i — 1, the net change of P is 
w{ai,a 2 ) — w{a 2 ,af) + • • • -f w{ap-i,ap) = i\ + ip — 1 > 1. If neither a\ nor Op is 
matched by M, then Q = P is as desired. Otherwise, there are two cases: (I) exactly one 
of Oi and Op is matched by M ; (2) both a\ and Op are matched by M. By symmetry. Cases 
1 and 2 are similar, and we only detail the proof of Case 1 while further assuming that 
oi is matched by M. Since ai 02 f M, aooi e M for some oq. Let Q = oq, ai, . . . , Up. 
Then, Q is alternating for M. The net change of Q is —w{ao, af + ii + ip — 1. Since 
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mm(<^(Gf )) 



uiva((l>{G)\Ch) 




mm(co(<^(G)|Ch)) 



mm(</>(G)) 




Fig. 1. The relationship between the graphs involved in the proof of Theorem 1 . Matehings that 
have equal value are conneeted by a dotted line. 

aoai € M, (f>{M) has the edges ajaf , . . . , aga}, where /? = w{ao, ai). Then, sinee a\^ 
is not matched by i\ > tt;(ao, ai) + 1. Therefore, the net change of Q is positive, 

and Q is as desired. 

2.3 Proof of Theorem 1 

Let (f){G)\Ch be the subgraph of <^(G) induced by the edges incident to nodes with 
j < Ch{u). Lemma 4 below shows the relationship among 

G, Gh,G^,(f>{G), (f){G)\Gh,(l>{G) - (f>{G)\Gh,HG^), as depicted in Figure 1. Theo- 
rem 1 follows immediately from Lemmas 3 and 4. 

Let V (H) denote the node set of a graph H. 

Lemma 4. 1. mm{(l>{G)\Gh) = mwm{Gh)- 
2. mm{(i){G) - (i){G)\Gh) = mm{(i>{G^)). 

J. mm(<p(G)) = mm{(j){G)\Gh) + mm(0(G) - (p{G)\Gh)- 

Proof. The statements are proved as follows. 

Statement 1. Let D be a minimum weight cover of 4>{G)\Gh. Since Gt is a mini- 
mum weight cover of Gh, by Fact 1, it suffices to show Yhu' D{u^) = Gh{u). For 
< 'f2^Gh(u), consider D' : V{(l>{G)\Gh) — ^ {0, 1, 2, ...} where for every 
tt* e V{(l){G)\Gh), D'{v2) = 1 if i < Gh{u) and 0 otherwise. Since all the edges 
in (f){G)\Gh must be of weight 1 and must be attached to some m* where i < Gh(u), 
D' is a weighted cover of (t>{G)\Gh. Thus, D{v2) < Dfu") < Yu Ch{u). 
For Yu' D{v 2) > Yu ^h{u), consider D” : V{Gh) — ^ {0, 1, 2, . . .} where for every 
u e V{Gh), D"{u) = Yi D{u^). We claim that D” is a weighted cover of Gh and 
thus, Yu' D{v 2) = Yu^"('^) ^ J2u^h{u). To show that D" is a weighted cover 
of Gh, note that for every edge uv in Gh of weight z, according to the construction, 
<l){G)\Gh has at least z edges of the form u'^vf As I? is a node cover of <l){G)\Gh, 
D”{u) + D''{v) = Yi D{v2) + Yj D{v^) > z. Hence, D” is a weighted cover of Gh- 
Statement 2. To show mm((^(G) — (f>{G)\Gh) < mm((^(G^)),letMbe a maximum 
cardinality matching of <^(G) — Note that for any edge € M,i > Gh{u) 

and j > Gh{v). Let M' = | € M}. Note that M' is a mat- 

ching in (f){G^) and \M\ = |M'|. Thus, mm((^(G) — (f){G)\Gh) = \M\ = |M'| < 
mm((^(G^)). To show mm((^(G^)) < mm{(f){G) — <^(G) jC/i), let M" be a maximum 
cardinalitymatchingof(/>(G)f). Similarly, we can verify that | u'‘v^ 

e M”} is a matching of (f){G) — (f){G)\Gh. Thus, mm((^(G^)) = \M”\ < 

mm{(i>{G) - (i){G)\Gh). 
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Statement 3. Only mm((^(G)) > mm((^(G)|C'/i) + mm((^(G) — (f>{G)\Ch) will be 
proved; the other direetion is straightforward. Let M be a maximum weight matehing of 
G/i.LetMi = {u^v^ e 4>{G) \ uv ^ M andeitheri < Ch{u) or j < C'/j(u)}. Note that 
Ml is a matching of (^(G)|C'/i with = mwm(G/j) edges. By Statement 1, Mi 

is a maximum cardinality matching of 4>{G)\Ch- Let M 2 be any maximum cardinality 
matching of 4>{G) — 4>{G) jC/i. We claim that M 1 UM 2 forms a matching of <^(G). Then, 
mm{(f>{G)\Gh) + mm( 0 (G) - (f){G)\Gh) = \Mi\+ IM 2 I < mm( 0 (G)). 

Suppose Ml U M 2 is not a matching. Then there exist two edges ei G Mi and 
62 € M 2 that share an endpoint. Since ei G Mi, it is of the form where uv G 
M,i + j = w{u, v) + 1, with i < Gh(u) or j < Gh{v). Without loss of generality, 
assume i < Gh{u). With respect to Gh, M and Gh satisfy Fact 1 and Gh{u) + Gh{v) 
equals the weight of uv in Gh, i.e., w{u, v) — {N — h). Putting all relations together, 
j > {N -h) + l + Gh(v). 

Asi < Gh{u),u^ is not adjacent to any edge in (^(G) — (ji(G) |G/i. Thus, the endpoint 
shared by ei and 62 must be . Let 62 be t^vGAs 62 G (piG) — <l){G)\Gh, k > Gh{t), 
j > Gh{v) and j + k = w{t,v) + 1. Therefore, j < w{t,v) + 1 — Gh{t). Since 
j > (N" — /i) + 1 + Gh{v), Gh{t) + Gh{v) < w{t,v) — (N — h). However, Gh is 
a weighted cover of Gh and thus, Gh{t) + Gh{v) > w{t, v) — {N — h), reaching a 
contradiction. 



3 Construct a Maximum Weight Matching 

The algorithm in §2 only computes the value of mwm(G) . To report the edges involved, 
we first construct a minimum weight cover of G in 0{i/nW) time and then use this 
cover to construct a maximum weight matching in 0{i/nm) time. 

Lemma 5. Assume that h, Gh, Gh, and G^ are defined as in §2. In addition, let G^ 
be any minimum weight eover of G^. If D is a function on V (G) such that for every 
u e V {G), D{u) = Gh{u) + G^{u). Then D is a minimum weight cover of G. 

Proof. Note that for any edge uv of G, its weight in G^ is w{u, v) — Gh{u) — Gh{v). 
Since G^ is a weighted cover, G^{u) + G^{v) > w{u, v) — Gh(u) — Gh(v). Thus, 
T>(u) + D{v) = Gh{u) + G^{u) + Gh{v) + G^{v) > w(u, u). It follows that I? is a 
weighted cover of G. To show that D is minimum, we observe that 

StiSt^(G) = Y,ueV(G) ^h{u) + G^{u) 

= Yhu€V(G) ^h{u) + Yhu^V(G) 

= mwm(G/i) + mwm(G)f ) by Fact 1 

= mwm(G). by Theorem 1 



By Fact \, D is minimum. 

By Lemma 5, a minimum weight cover of G can be computed using a recursive 
procedure similar to Compute-MWM. 

Procedure Compute-Min-Cover(G) 

1 . Construct Gi from G. 

2. Find a minimum weight cover Gi of Gi. 
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3. Compute Gf- from G and G\. 

4. If Gf is empty, then return Gi; otherwise, let Gf = Compute-Min-Cover(Gf^) and 
return D where for all nodes u in G, D{u) = Gi{u) + G^{u). 

Lemma 6. Compute-Min-Cover(G) correctly finds a minimum weight cover of G in 
0{^/nW) time. 

Proof. The correetness of Compute-Min-Cover(G) follows from Lemma 5. For the time 
complexity, the analysis is similar to that of Lemma 2. 

Given a minimum weight cover D of G, & maximum weight matching of G is 
constructed from D as follows. Let iT be a subgraph of G which contain all edges 
uv with w{uv) = D{u) + D{v). We make two copies of H. Call them and H^. 

For every node u of H, let and denote the corresponding nodes in and H^, 

respectively. We union and to form H°‘^, and add to the set of edges 
{u°'u^ \u a V (H), D{u) = 0}. Note that has at most 2n nodes and at most 3m 
edges. We find a maximum cardinality matching K of iT using the matching algorithm 
in[10, 18]. By Lemma 7, the matching {uv \ u°‘v°‘ € iT} is a maximum weight matching 
of G. The time complexity of the construction is dominated by the computation of the 
maximum cardinality matching K, which is 0{yfim) [10,18]. 

Lemma 7. Let K be a maximum eardinality matching of 77“^. Then, = {uv \ 
u°‘v°‘ € K{ is a maximum weight matching of G. 

Proof. First, we show that 77“^ has a perfect matching. Let M be a maximum weight 
matching of G. Since D{u) + D{v) = w{u, v) for every edge uv e M, M is also a 
matching of 77. Let U be the set of nodes in 77 unmatched by M. By Fact 1, D{u) — 0 
for all u <E U. Let Q be {u°“u^ \ u <E U}. Let M“ = \ uv <E M{ and 

= {u^v^ \uv e M}. Note that Q U M“ U forms a matching in 77“^ and every 
node in 77“^ is matched by either Q, A7“ or M^. Thus, 77“^ has a perfect matching. 

Since 77 is a perfect matching, for every node u with D{u) > 0, must be matched 
by 77. Since there is no edge between and any in 77“^, there exists some 
with G 77. Thus, every node u with D{u) > 0 must be matched by some edge 
in 77“. Therefore, T.uv€K- w{u,v) = E„gxuy,DW>o = E„gxuy = 
mwm(G). 



4 All-cavity Maximum Weight Matchings 

This section shows that given G and a maximum weight matching M of G, we can 
compute mwm(G — {u}) for all node u in G using 0{W) time. 

Recall that 4>{M) is a maximum cardinality matching of <t>{G) (see Lemma 3). For 
each M* in 4>{G), let A{u‘) = 0 if there is an even-length alternating path for 4>{M) 
starting from u*; otherwise, A{ufi = 1. Consider any node u in G, let . . . , u^ 

be its corresponding nodes in 4>{G). The following lemma states the relationship among 
all A{ufi. 
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Lemma 8. If A{u^) = 0, then for all i < j < P, A{u^) = 0. Furthermore, we ean 
construct /? — i + 1 node-disjoint even-length alternating paths Pi, Pi+i, ■ ■ ■ P /3 for 
4>{M), where each Pj starts from uf 

Proof. As A{u^) = 0, let Pi = Uq , v^° , uf ,vf , ufll , , uf of <A(G) be a 

shortest even-length alternating path for <p{M) where uf = tt*. Note that Pj must be 
simple and uf is not matched by 

Based on Pi, we ean construct an even-length alternating path for 4>{M) starting from 
Let h = min{g | uf is not matched by 4>{M)f which must exist aceording to 
the definition of Pj and Then, Pj+i = u^^^ ,uf^^ , , uf^^ is 

an even-length alternating path for 4>{M). 

Similarly, even-length alternating path Pj for 4>{M) starting from u^ can be found 
for j = i + 2, ■ ■ ■ , p. Also, it can be verified that Pj, Pj+i, ■ ■ ■ P /3 are node-disjoint. 

The next lemma shows that, given mwm(G), we ean compute mwm(G — {u}) from 
the values A{u^). 

Lemma 9. Xli<i </3 A{u^) = mwm(G) — mwm(G — {«})■ 

Proof. Let k be the largest integer such that A{u^) = 1. By Lemma 8, A{uf = 1 for 
all 1 < i < k, and 0 otherwise. Thus, = k. 

Below, we prove the following two equalities: 

(1) mm((^(G) — {u^, . . . , u^}) = mm((^(G)) — k. 

(2) mm((j>(G) — . . . , u^}) = mm(0(G) — . . . , u^}). 

Then, by Lemma 3, mwm(G) = mm(0(G)) and mwm(G — {u}) = mm{p{G) — 
. . . ,u^}). This implies mwm(G) — mwm(G — {«}) = k and the lemma follows. 
First, we show Equality (1). Let iT be the set of edges of p{M) ineidentto u* with 1 < 
i < /c.LetM' = — iT. Then, \M'\ = |(^(M)| — /c.WeelaimthatM' is a maximum 

cardinality matching of p{G) — {u^, ...,u^}. Hence, mwm((^(G) — {u^,...,u^}) = 
|(^(M)| — k; Equality (1) follows. Suppose M' is not a maximum cardinality matching 
of p{G) — . . . ,u^}. Then, there exists an odd-length alternating path P for M' in 

p{G) — . . . , u^} whose both ends are not matched by M' [15,16]. P must start 

from some node with u^v^ € p{M) and i < k. Otherwise, P is alternating for p{M) 
in G and p{M) cannot be a maximum cardinality matching of p{G). Let Q be a path 
formed by joining u^v^ with P. Q is an even-length alternating path for p{M) starting 
from u* in p{G). This contradicts the fact that there is no even-length alternating path 
for p{M) starting from u* for i < k. 

To show Equality (2), by Lemma 8, we construct /?— k node-disjoint alternating paths 
Pfe+i, ■ ■ ■ ,Pf 3 forp{M) where Pj starts at u-’. Let M" be p{M)(BPk+i(B. ■ .©P/j.Note 
that \M” I = I and there are no edges in M” incident to any u* with k + 1 < i < p. 

Then, M" — iT is a matching of p{G) — . . . ,u^} with size at least |M"| — k = 

|(^(M)| — k. Since mm{p{G) — {u^,...,u^}) = |<A(M)| — k by Equality (1) and 
mm((^(G) — {u ^, . . . , < mm{p(G) — . . . , u^}), Equality (2) follows. 

Thus, to find mwm(G — {u}) for all nodes u in G, it suffices to find App) for all 
u* in p{G). This can be done in 0(W) time by Lemma 10. 
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Lemma 10. For all u‘ € A{u^) can be computed in 0{W) time. 

Proof. We want to find out, for all whether A{u^) = 0, i.e., whether there is an 
even-length alternating path u‘,vf ... for 4>{M). Let Aj = {w* € \ the shortest 

even-length alternating path for 4>{M) starting from w* is of length 2ji}. Note that anode 
is in IJ^. Aj if and only if there is an even-length alternating path for starting from 

that node. As Aq is the set of nodes in (f){G) that is unmatched by 4>{M),Ao can be found 
in 0(W) time. Observe that a node x is in Aj^i if and only \f x f Aq\J Ai\J . . .\J Aj 
and there is a length-2 path between x and a node in Aj such that the edge incident to 
X belongs to 4>{M). By examining all such length-2 paths, Aj+i can be computed in 
0{tj) time where tj is the sum of the degree of the nodes in Aj. Therefore, all Aj can 
be found inductively in G{'f2j tj) time. Since the sets Ai are disjoint and there are W 
edges in 'ffj tj < 2W and hence all Ai can be found in 0{W) time. 

5 Maximum Agreement Subtrees 

An evolutionary tree T is a tree whose leaves labeled by distinct symbols. Let C{T) 
be the set of labels used in T. For L C C{T), L induces a subtree T whose nodes 
are either the leaves labeled with L or the least common ancestors of any two nodes 
labeled with L, and whose edges preserve the ancestor-descendant relationship of T. A 
maximum agreement subtree of T\ and T 2 is defined as follows: Let T[ be a subtree of 
Ti induced by some subset of C{Ti) n C{T 2 ). T 2 is defined similarly for T 2 . T[ and T 2 
are each called an agreement subtree of T\ and T 2 if they have a leaf-label preserving 
isomorphism. A maximum agreement subtree of T\ and T 2 is an agreement subtree that 
contains the largest possible number of labels. Denote mast(Ti,T 2 ) as the number of 
labels in a maximum agreement subtree of T\ and T 2 . 

In the following subsections, we present two algorithms for computing a maximum 
agreement subtree of T\ and T 2 , depending on whether T\ and T 2 are rooted and unroo- 
ted. Both algorithms run in time where i = max{|Ti|, IT 2 I}. To simplify our 

discussion, we focus on computing the value of mast(Ti, T 2 ). 

5.1 Rooted Maximum Agreement Subtrees 

Let Ti and T 2 be any rooted evolutionary trees. This section shows that with the new 
matching algorithm, the algorithm of Farach and Thorup [9] for computing mast(Ti, T 2 ) 
can be improved to run in time. 

Fact 2 (see [9]) Let tn,w be the required time to eompute a maximum weight matching 
of a bipartite graph with node count n and total weight W. Then, the rooted maximum 
agreement subtree ofT\ and T 2 can be found in + ti^o{i)) time. 

To utilize Fact 2, Farach and Thorup [9] applied the Gabow-Tarjan matching algo- 
rithm [14] and showed that mast(Ti, T 2 ) can be computed in -|- s/ItAogf) = 

® logf) time. This is the fastest known algorithm in the literature. We replace the 
matching algorithm and give an improvement as shown in the following theorem. 
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Theorem 11. The rooted maximum agreement subtree of T\ and T 2 can be computed 
in time. 

Proof. We replace the Gabow-Tarjan matching algorithm with the matching algorithm 
described in §2. Then is equal to instead of 0(f^'®logf). Thus, from 

Fact 2, mast(Ti, T 2 ) can be computed in time. 

5.2 Unrooted Maximum Agreement Subtrees 

Consider two unrooted evolutionary trees T\ and T 2 . This sectionshows that mast (Ti, T 2 ) 
can be computed in time. 

Similar to §5.1, computing unrooted maximum agreement subtree is based on bi- 
partite matching algorithms. However, to improve the time complexity, in addition to 
apply our new matching algorithms, we need to take advantage of the structure of the 
bipartite graphs involved. Lemma 12 shows that our new matching algorithms can be 
further adapted to take advantage of the structure of the bipartite graphs. 

Lemma 12. Consider a bipartite graph G. Let u, v be any two nodes of G and let w be 
the total weight of all the edges of G that are not attached to u and v. Then mwm(G) 
can be computed in 0{^/nw) time. Furthermore, mwm(G — {u}) for allu £ G can be 
computed in the same time complexity. 

Proof. To be shown in the full paper. 

We are ready to show the computation of mast(Ti, T 2 ). Let match(ru, w') and 
cavity (ta, w') be the time required to solve the maximum weight matching problem and 
the all-cavity matching problem of a bipartite graph G whose total weight is at most w' 
and whose total weight after excluding the edges adjacent to two particular nodes is at 
most w. The work of Kao et al. [20,22] can be interpreted as follows: 

Fact 3 (see [20,22]) TCmatch (w,w') and cavity {w,w') are fl (tci+o(i)), then mast (IJ, T 2 ) 
can be computed in T{i, £) time where 

+ E (match(a;i,t") + cavity(a:i,t")) + T{£i,£') 

Y^xi=t 

Based on the Gabow-Tarjan matching algorithm [14] and the all-cavity matching 
algorithm in [21], both match(rc, w') and cavity(rc, w') equal 0{w^'^ log (tczc')). The- 
refore, by Fact 3, mast(Ti,T 2 ) can be computed in 0(f^ ®logf) time. If we apply 
Lemma 12, both match(tc, w') and cavity(rc, w') equal and the time for fin- 

ding mast(Ti, T 2 ) is reduced to 
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Abstract. This paper eonsiders a number of NP -complete problems, and provides 
faster algorithms for solving them exactly. The solutions are based on a recursive 
partitioning of the problem domain, and careful elimination of some of the bran- 
ches along the seareh without actually checking them. The time complexity of 
the proposed algorithms is of the form 0(2^") for eonstant 0 < e < 1, where 
n is the output size of the problem. In particular, such algorithms are presented 
for the Exact SAT and Exact Flitting Set problems (with e = 0.3212), and for the 
Exact 3SAT problem (with e = 0.2072). Both algorithms improve on previous 
ones proposed in the literature. 



1 Introduction 

One of the main avenues in our struggle with NP-complete problems involves attempting 
to develop faster exhaustive-search algorithms for their solution. In particular, certain 
problems can be solved by algorithms whose complexity depends on specific parameters 
of the problem at hand, which may be considerably smaller than the input size. 

One natural parameter arises in problems II whose solution space eonsists of all n 
bit vectors, where a veetor a; is a legal solution of the problem if it satisfies a certain 
n-ry predicate Pred 7 j(a;). Clearly, every problem of this type can be solved optimally 
by a naive exhaustive seareh algorithm, cycling through all 2"' possible solutions and 
testing each of them individually. But for eertain problems it may be possible to reduee 
the search cost significantly below 2"^ by applying clever ways of eliminating some of 
the eases without actually checking them directly. 

One approach for achieving that is the recursion with elimination teehnique. This 
method is based on a reeursive partitioning of the solution spaee, carefully eliminating 
some of the branches along the search by using special transformation rules fitted for 
the problem at hand. Algorithms based on this approach were developed for a number 
of problems, including an time algorithm for Maximum Independent Set 

(MIS) [12,5,9] and a number of variants of Satisfiability (SAT). In particular, a bound of 
0(1.5") (or 0(2° ®®®")) was proved for 3SAT, the variant of SAT in which each clause 
contains at most three literals [6,10]. For the general SAT problem there is an 0(2°'^^®") 
time and space solution [9], but the best known time bound using polynomial space is 
still 0(2"). The complexity of certain algorithms developed for SAT can be bounded 
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in terms of two other parameters, namely L, the length of the input formula and K, the 
number of clauses it contains. Specifically, SAT can be solved in time 0(2°-^°®^) or 

q(20.105L) 

A different approach to the efficient solution of NP-complete problems is the 2-table 
method introduced in [1 1]. This method is based on splitting the n output variables into 
two sets of n/2 variables each, creating all possible 2"'/^ partial solutions on each set, 
and then scanning all possible combinations of one partial solution from each set in 
a sorted cost order, ensuring that overall, only 0(2"^/^) cases need to be tested. The 
method thus yields an 0(2"'/^)-time, 0(2"'/^)-space algorithm. The class of problems 
for which this method is applicable is given an axiomatic characterization in [1 1], and is 
shown to include, in particular, the Knapsack, Exact Hitting Set (XHS) and Exact 3SAT 
(X3SAT) problems. 

A generalization of this method to 4 tables, also presented in [11], succeeds in 
reducing the space requirements of the algorithm to 0(2"/^), but does not improve the 
time requirements. In fact, an open problem posed in [11] (which is still open to date, 
to the best of our knowledge), is whether variants of the A: -tables approach can yield 
Q(2n,/2) -time algorithms for any of the problems mentioned above (cf. [2]). 

While the current paper does not answer those questions, it does demonstrate that if 
the answer for the above questions is negative, then the difficulty is not inherent to the 
problems under consideration but rather to the A: -table method itself. This is established 
by providing faster-than-2”/^ algorithms for solving a number of the problems handled 
in [ 1 1 ] . The solutions are based on the recursion with elimination approach, and their time 
complexity is 0(2^”) for constant 0 < e < 1/2. Their space complexity is polynomial 
in n. Hence the complexity of the 4-tables algorithm for these problems can be improved 
upon, albeit perhaps not by a variant of the A: -tables technique. 

In particular, the following results are presented in this abstract. We first derive an 
O(20-32i2n^ time algorithm for Exact SAT (XSAT), the variant of SAT in which the 
solution assignment must satisfy that each clause in the formula has exactly one true 
literal. We also give an time algorithm for X3SAT. By a straightforward 

reduction to XSAT, we also get an (9(2®'32i2n^ algorithm for XHS. 

Finally let us remark that very recently we have managed to improve the results for 
XSAT and XHS, obtaining an time algorithm for both problems [1]. 

2 An Algorithm for Exact 3SAT 

2.1 Terminology 

Let X = {xi, ... , be a set of Boolean variables. A truth assignment for X is a 
function r : X i— ^ {0, 1}; we say that u is “true” under r if r(u) = 1, and “false” 
otherwise. With each variable u we associate two literals, u and u. The truth assignment 
T is expanded to literals by setting t[u) = 1 if t{u) = 0, and t[u) = 0 if t{u) = 1. 

A clause over X is a set of literals from X . A clause is satisfied by a truth assignment r 
iff exactly one of its literals is true under r. A collection C of clauses over X is satisfiable 
iff there exists some truth assignment that simultaneously satisfies all the clauses in C. 

The Exact 3 Satisfiability (X3SAT) problem (called one-in-three 3SAT in [3]) is 
defined as follows. A clause C is called a A:-clause, for integer A: > 1, if it consists of k 
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literals. Given a set X = {xi,... , of variables and a colleetion C = {Ci,... 
of 3-elauses over X, deeide whether there exists a truth assignment for X, sueh that 
eaeh clause in C has exactly one true literal. Note that a clause may contain two or 
more literals with the same variable. For example, the following are legal 3-clauses: 
{x\,x\,x\}, {x\,x\,X 2 }, {x\,x\,X 2 }, etc. However, note that the first of those can 
never he satisfied; in the second, the only satisfying assignment is t{x\) = 0 and 
t{x 2 ) = 1; and in the third, every satisfying assignment must have t{x 2 ] = 0. 

Let us introduce the following terminology. For a clause C, let X{C) denote its set 
of variables. A variable occurring in a single clause is called a singleton. A variable x 
appearing in the same affinity in all clauses in C (i.e, always as x, or always as x) is 
hereafter referred to as a constant variable (it is sometimes called a pure literal as well). 
For a literal f , let x{() denote the corresponding variable, and let V (f ) denote its affinity, 
namely, V{i) = lif ^ = x and ) = 0 if f = x. The opposite affinity is denoted by 
V{1) = 1 — It is also convenient to use the notation f, for a literal to signify the 
opposite literal, i.e, i = x if i = x, and 1= x otherwise. 



2.2 Cannonical Instances 



Our general strategy is based on simplifying the instance at hand via either reducing the 
number of clauses or reducing the number of variables occurring in them. This is done 
hy using two basic operations, namely, fixing the truth assignment of certain variables, 
or identifying certain variable pairs with each other. More formally, for a variable x and 
a bit & G {0,1}, we denote by Fix(a:, b) the restriction of the problem instance at hand to 
truth assignments r in which t{x) = b. Applying this operation allows us to eliminate 
X from the instance entirely, as follows. First, eliminate x from every clause where it 
occurs as a literal £ with affinity V{t} = 1 — 6. (This makes the clause smaller, and 
hence may help us to eliminate the clause as well, as is discussed shortly.) Next, observe 
that every clause C where x occurs as a literal with affinity V {£) = 6, is satisfied by x, 
hence it can be discarded. However, note also that as £ is satisfied, all other literals in 
C must be falsified, which immediately forces us to apply Fix to those literals as well. 
This chain reaction may proceed until no additional variables can be fixed. 

Two specific cases of the Fix operation deserve special notation. For a literal £, we 
write FixTRUE(f) to mean Y\x{x{£),V {£)), and FixFALSE(f) to mean Y\x{x{£),V {£)). 

Also, for two literals i \ and £2 (of different variables) we denote by Identify (f 2 , ) 

the restriction of the instance to truth assignments r in which r(f 2 ) = T(fi ). Applying 
this operation allows us to discard of the variable x{£2) in our instance, by replacing 
any occurrence of £2 in a clause with £\, and any occurrence of £2 with fi. As a result 
of this operation, it might happen that in some clause C (not necessarily the one that 
caused us to apply the operation, but some other clause), there are now two copies of 
the variable x\. For example, this will happen if we applied lDENTiFY(a;i, X2) and had 
some other clause C = (xi, 0 : 2 , xs). So after the identification C becomes {xi,xi,x^). 
Note that these are two distinct occurrences of x\, so V (xi) must be set to 0, otherwise 
this clause is satisfied twice. Similarly, if we had a clause C = (xi,X2,Xs), then 
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after the identification it becomes (a:i,a;i,a; 3 ), and necessarily we must fix V{x:i) to 
0. Again, such simplification may lead to a chain reaction, continuing until no further 
simplifications can be performed. 

Note that the Fix and Identify operations are both linear in the input size. 

We would like to identify the class of instances that cannot be simplified automa- 
tically. Towards that end, a cannonical instance of X3SAT is defined as an instance 
enjoying the following properties: 

(PI) Any two clauses in the instance share at most one common variable. 

(P2) There is at most one singleton in every clause. 

(P3) In every clause there are exactly three different variables. 

The above definition is justified by the following simple observations, which indicate 
how non-cannonical instances can be easily transformed into cannonical ones. (Some 
of the proofs are omitted from the extended abstract). .5 

Claim 1 If the instanee contains a 1-clause, C = {£}, then the clause ean be discarded 
along with the variable x{l). 

Claim 2 If the instanee contains a 2-clause, C = {f\ , £ 2 }. then the elause can be 
discarded along with one of the variables. 

Claim 3 Whenever one variable in a 3-clause C is fixed, the instance can be simplified 
by discarding at least one more variable and clause. 

Claim 4 If the instance eontains two clauses with literals based on the same variables, 
then the instance can be either decided (by an 0{l)-step test) or simplified by discarding 
one clause and possibly some of the variables. 

Proof. Let the two clauses be C = {fi,t' 2 ,I' 3 }andC" = {fi,£' 2 ,t'-i},w\i\ix{£i) = x{l'f) 
for i = 1, 2, 3. Not all three affinities are identical, since the two clauses are different, 
hence there are seven cases to consider. 

If £l f £j for every j = 1, 2, 3 then the instance cannot be satisfied. 

Now suppose exactly one variable occurs with the same affinity in the two clauses, 
without loss of generality £3 = £ 3 . In this case, the truth assignment may not satisfy £ 3 , 
since this will force all other literals in C and C to be falsified, which cannot be done. 
Hence any satisfying truth assignment r is forced to set t{x{£ 3 )) = V {£ 3 ), and we must 
apply FixFALSE(f 3 ). Hence by Claim 3, it is possible to discard the two clauses and the 
variable. 

Finally suppose that exactly one variable occurs with opposite affinities in the two 
clauses, without loss of generality £3 = £ 3 . In this case the instance cannot be satisfied. 

I 

Claim 5 If the instanee contains two clauses with two common variables, then the 
instance can be either decided (by an 0{l)-step test) or simplified by discarding one 
clause and possibly some of the variables. 

Proof. Let the two clauses be C = {£\,£2,£^} and C = {£f£'2,£4f, where x{£i) = 
x{£'f) for i = 1, 2 and x{£ 3 ) f x{£f}. The following three cases are possible: 
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and £2 = ^2- In this case any satisfying truth assignment may satisfy 
at most one of and If or is satisfied, neeessarily £3 and f4 must both be 
falsified. Conversely, if neither nor is satisfied, neeessarily f 3 and ^4 must both be 
satisfied. Henee £3 and £4 must be given the same truth assignment, and we may apply 
lDENTiFY(f4,f3) and discard x{(.4) and the clause C . 

2. nnd l'^ ^ £2- In this case no satisfying truth assignment may satisfy £1, 
because then one of the elauses will have two satisfied variables. Thus we may apply 
FixFALSE(fi). Hence by Claim 3 , it is possible to discard the variables a;(f3), x{£4) and 
the two clauses. 

3 . £[ 7^ £i and £'^ 7^ £2- If both £\ and £2 are satisfied then C is not satisfied, and 

if both £\ and £2 are not satisfied then C is not satisfied. Henee any satisfying truth 
assignment r must satisfy exacfly one of the pair This means that we must 

apply Identify (f 2, fi). Also, this forces any satisfying truth assignment to falsify both 
£3 andf4, hence we may apply FixFALSE(f3) and FixFALSE(f4). Consequently, we may 
diseard the two clauses. | 

Claim 6 If the instance contains a clause with two singletons or more, then the instance 
can be simplified by discarding that clause and two variables. 

Claim 7 If every clause in an instance contains a singleton, and all variables are con- 
stant, then the instance is satisfiable (and a satisfying truth assignment can be found in 
time linear in the input size). 

Now we present a procedure of time complexity O {mn), which given a non-cannonical 
instance C of the problem transforms it to a cannonical instance C . 



Procedure CannonizeX 3 SAT(C) 

While the instance C is not cannonical repeat: 

(a) For every 1-clause C containing literal i do: 

Apply FixTRUE(f) and discard the variable x{£). Discard any other clause C which 
contains x{f) as follows: 

If the variable is in the same affinity in C as in C (and thus satisfies the elause C), 
the other variables of C must not be satisfied, so the FixFalse operation can be 
applied to each of them. If the variable is not in the same affinity in C , then it can 
be removed from C , and a 2-clause would be left. 

(b) For every 2-clause C do: 

Simplify C by Claim 2. 

(e) For every clause containing a repeated variable x\ do: 

If a; 1 is repeated three times with the same affinity (i.e, {xi , Xi , Xi }), a eontradiction 
occurs, and C is decided. Else diseard the repetition as follows: 

1. Ifxi appears twiee with the same affinity b (i.e, {xi,Xi,X2} or {xi,xi,xi}): 
x\ must be set to b (in order to avoid the elause from being satisfied twiee). 
Furthermore, the third variable must be set to 1 or 0 for satisfying the clause. 

2. If Ti appears twiee, and in both affinities (i.e, {x\,x\,X2})'. 

The variable not repeated must be falsified. 
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(d) For each clause in C with more than one singleton, simplify C by Claim 6. 

(e) For every two clauses with two or more common variables do: 

Decide or simplify C by Claims 4 and 5. 

EndJVhile 

The output of Procedure CannonizeX 3SAT(C) is of the form (6, C') where b is: 

^ _ f 0, a contradiction occured, 

1 1, the cannonization ended successfuly. 

and C' is the resulting cannonical instance if 6 = 1 . 

Each iteration of the algorithm takes 0{m) time and discards at least one variable, 
and so the algorithm terminates in time 0{mn). 

Lemma 8. A non-cannonical instance C can be transformed into a cannonical instance 
C in time 0{mn), with C containing (strictly) fewer variables than C. 



2.3 A Recursive Algorithm 

Let C be an instance of the X3SAT problem. The recursive algorithm X3SAT(C) for 
finding a satisfying truth assignment r or a contradiction operates as follows. Let us 
first describe the basic recursive procedure Test(C, x, v), where C is an instance, a: is a 
variable and v G {0, 1}. 



Procedure Test(C, x, v) 

1. Apply Fix(a;, u). 

2. Transform C into a cannonical C' by invoking Procedure 
(6, C') CannonizeX3SAT(C). 

3. If 6 = 0 (a contradiction occurs) then return 0. 

4. Else if C' = 0 (all variables are discarded) then return 1 . 

5. Otherwise, recursively invoke b' X3SAT(C') and return 6'. 



Main Algorithm X3SAT(C) 

1 . If every clause contains a singleton variable and all variables are constant then return 

1 . 

2. Else choose a variable Xi according to the following steps. 

a) If there is a non-constant variable Xi in C, choose it. 

b) Else pick a clause with no singleton in it, and choose one of its variables. 

3. Invoke the recursive procedure b' ^ Test(C, Xi,l). 

4. If b' = 1, then return 1 and halt. Otherwise invoke b' Test(C, Xi, 0) and return 
b'. 




456 



L. Drori and D. Peleg 



2.4 Analysis 

In the full paper we prove the following theorem. 

Theorem 9. The recursive algorithm X3SAT (C) solves theXSSAT problem in time com- 
plexity 0{m ■ where n and m are the number of variables and clauses of C, 

respectively. 

In order to illustrate the flavor of the proof, let us provide a simpler analysis for a 
weaker bound, namely, 0{m ■ 

Suppose that the algorithm is applied to an instance C containing n variables. First, 
note that if the condition of step 1 holds, then the procedure terminates correctly by 
Claim 7. Hence from now on we restrict our attention to the case where this condition 
is not met. In this case, the algorithm picks a variable Xi, and tests both possible ways 
of fixing its truth value, namely 0 and 1. In both cases, fixing the value of Xi yields a 
non-cannonical instance, which can be simplified further, by identifying some variables 
and fixing the truth values of some others. Once a cannonical instance C' is obtained, it 
is handled recursively by the algorithm. The crucial observation is that in each case, the 
resulting instance C has fewer variables than the original C. Analyzing the number of 
variables discarded in each step is the central component of our analysis, and it yields 
the recurrence equations governing the complexity of the algorithm. 

Let /(m, n) denote the worst-case time complexity of Algorithm X3SAT on m- 
clause, n-variable instances. Note that in the case of 3-clauses, the maximal number of 
clauses in an instance is = O(n^), hence the input length ism log n = O(n^logn). 

The number of variables discarded at each step of the recursion is analyzed as follows. 
Recall that step 2 chooses some test variable Xi. It is chosen by step 2a or step 2b. We 
shall next examine these two cases separately. 



Test Variable Chosen in Step 2a. Let us first provide a straightforward analysis for the 
case where Xi was chosen by step 2a, namely, it is a non-constant variable. Given the 
linear dependence of /(m, n) on m, we shall henceforth ignore m, and analyze only 
the dependence of the complexity on n, denoted by the function /(n). Let C\ and C 2 
be two clauses in which Xi appears in opposing affinities. Without loss of generality let 
Cl = {xif2f3} and C 2 = {xi,£4,h}- 

When Test(C, xi , 1) is invoked, xi is fixed to 1 . Hence Ci is satisfied, so we may ap- 
ply FixFALSE(f 2 ) and FixFALSE(f 3 ). Moreover, by Claim 2 we apply lDENTiFY(f 5 , if). 
Thus xi, x{£ 2 ), x{£f and x{£f are discarded. 

Likewise when Test(C, xi, 0) is invoked, x\ is fixed to 0. Hence C 2 is satisfied, 
so we may apply FixFALSE(f 4 ) and FixFALSE(f 5 ), and again, by Claim 2 we apply 
Identify (f 3, £2)- Thus x\, x(£f, xfs) and x(£f are discarded. 

It follows that the recurrence equations governing the complexity of Algorithm 
X3SAT are the following. By the above analysis we get that in case the variable Xi 
is always chosen by step 2a, then f{m, n) satisfies 



f{m,n) < 



j 0(1), step 1 applies, 

( 0(mn) + 2/(m, n — 4), otherwise. 
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This recurrence solves to give f{m,n) = 0(m- 2"'/^). The improved result of Theorem 
9 is based on refining this analysis by breaking the discussion further into subcases, and 
deriving an improved recurrence equation for each case. Given the linear dependence of 
f{m, n) on m, we shall henceforth ignore m, and analyze only the dependence of the 
complexity on n, denoted by the function /(n). Hence the corresponding recurrence is 

/(n)<2/(n-4) (1) 



Test Variable Chosen in Step 2b. Now let us analyze the case where Xi was chosen by 
step 2b of the algorithm X3SAT. In this case, each variable appears in all of its clauses 
with the same affinity. Without loss of generality let us assume all variables appear in the 
positive form. Furthermore, Xi was chosen from a clause which contained no singleton. 
Let C\ = {x\,X 2 ,x^} be the clause and x\ be the variable chosen, and let C 2 be some 
other clause containing x\ . Since C is cannonical, two clauses may not have two or more 
common variables, by property (PI). Hence let C 2 = {x\,Xi, 0:5 }. When Test(C, x\ , 1) 
is invoked, X2, X3, X4 and X5 are falsified, resulting in five fixed variables. 

When Test(C, xi, 0) is invoked, lDENTiFY(a:3, X 2 ) is imposed by Ci, and Identify 
(xs, X 4 ) by C 2 . Therefore three variables are discarded. 

By the above analysis we get that in case the variable Xi is always chosen by step 
2 b, then /(n) satisfies 



/(n) < /(n- 3) + /(n- 5) (2) 

It follows that the time complexity of Algorithm X3SAT is bounded by a function 
/(n) which obeys inequalities (1) and (2). These inequalities both solve to give /(n) = 
2^” for some constant 0 < e < 1, whose value is determined by the specific inequalities. 
In particular, the constraints imposed by the specific inequalities at hand can be calculated 
to be the following: 

/(n) < 2/(n - 4) ^ fc > 0.25 , 
f {n) < f{n - 3) + /(n - 5) ^ e > 0.2557 . 

Hence the bound achieved on the worst-case time complexity of our algorithm is 

0 (m • 



3 An Algorithm for Exact SAT 

3.1 Terminology 

The terminology of XSAT is similar to that of X3SAT as explained in Section 2.1. The 
only difference is in the definition of the collection C, which in X3SAT is restricted to 
have 3-clauses; in XSAT this restriction no longer holds. 



3.2 Cannonical Instances 

Our general strategy is the same as that of X3SAT as presented in Section 2.2. We shall 
again use the Fix, FixTrue, FixFalse and Identify operations. We shall also use to 
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denote a literal on the same variable as (i.e, = x{£i)). Furthermore, Claims 1, 

2, 3 and 7 still hold. 

A cannonical instance of XSAT is defined as an instanee enjoying the following 
properties: 

(Ql) Any two 3-clauses share at most one common variable. 

(Q2) No clause contains fewer than three variables. 

(Q3) No clause contains the same variable more than once. 

(Q4) There is at most one singleton in every clause. 

(Q5) 

There are no two constant variables such that each clause either contains both or 
contains neither. 

(Q6) There are no two clauses with the same number of variables r in which r — 1 
variables appear in both clauses and with the same affinity. 

(Q7) There are no two clauses such that all variables of one appear in the other. 

This definition is again motivated by a sequence of claims, including claims 1, 2, 3, 
7 of the previous section, and the simple observations presented next. 

Claim 10 If the instanee eontains two constant variables such that each clause eit- 
her contains both variables or contains neither, then the instance can be simplified by 
discarding one of the variables. 

Claim 11 If the instance eontains two clauses with the same number of variables r > 3, 
in which r — 1 variables appear in both clauses and with the same affinity, then the 
instance can be simplified by discarding a variable and one of the clauses. 

Proof. Let the two clauses be C\ = . . . ,lr-i,£r} and C 2 = {fi, ■ ■ ■ 

where x{£j.) f x{£t), r > 3 and jCil = IC 2 I. Any satisfying truth assignment must 
either satisfy exactly one of the literals £ 1 , . . . , £r-i and falsify £r and £t or falsify all 
the literals ii, . . . , £r-i and satisfy both £^ and £t. Hence £r and £t must have the same 
truth assignment, and it is possible to apply lDENTiFY(fr, ft) and discard a: (ft) and one 
of the clauses. | 

Claim 12 If the instance contains a clause with two singletons or more, then the instanee 
can be simplified by discarding at least one variable. 

Lemma 13. [Subsets lemma] If the instanee eontains two clauses C\ and C 2 such that 
all variables of C\ appear in C 2 (i.e, X{Ci) C X{C 2 )), then the instance can beeither 
decided (by an 0{l)-step test) or simplified by discarding one clause and some of the 
variables. 

Proof. Let the two clauses be Ci = {fi, f 2 , . . . , fm} and C 2 = {f'l, f^, ■ ■ ■ , f(,} where 
m < /c. The affinity of the common variables can be classified according to the following 
cases: 

- There are three or more variables with opposite affinities in C\ and C 2 . Then in 
every truth assignment some of these variables satisfy C\ and the rest satisfy C 2 . 
Since there are at least three such variables overall, at least one of the clauses is 
satisfied twice. Consequently, a contradiction occurs. 
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- There are exactly two variables with opposite affinities in Ci and C 2 . Without 
loss of generality let x(£i) and x(£ 2 ) be the variables. lDENXiFY(f 2 ,^i) may be 
applied, because otherwise one of the clauses is satisfied twice. Furthermore, all 
other variables in Ci and C 2 must be falsified. Hence x(£ 2 ), ■ ■ ■ ,x{£k) and the two 
clauses are discarded. 

- There is exactly one variable with opposite affinities in C\ and C 2 . Without loss 
of generality let this variable be x{£i). If k = m, then x{£i) satisfies one of the 
clauses, let it be C\ . Thus any other variable satisfying C 2 causes Ci to be satisfied 
twice. Hence a contradiction occurs. If A: > m, then any satisfying truth assignment 
must FixTRUE(fi) otherwise C 2 is satisfied twice. Hence it is possible to falsify 
£2, - ■ ■ ,£m and C2 must be satisfied by one of the remaining variables £'^^+1 > • • • A'k- 

- £i = £\ for each 1 < f < m. In this case, the variable satisfying Ci will necessarily 

satisfy C 2 as well. Thus f'^+u • • • ) be falsified, and C 2 may be discarded. 

I 

Now we shall present a procedure of time complexity 0{mn), which given a non- 
cannonical instance C transforms it to a cannonical instance C' . 



Procedure CannonizeXSAT(C) 

While the instance C is not cannonical repeat: 

(a) For every 1-clause C containing literal £ do: 

Apply FixTRUE(f) and discard the variable x{l). Discard or simplify any other 
clause C which contains x{l) as follows: 

If the variable is in the same affinity in C as in C (and thus satisfies the clause C), 
the other variables of C must not be satisfied, so the FixFalse operation can be 
applied to each of them. If the variable is not in the same affinity in C' , then it can 
be removed from C . 

(b) For every 2-clause C do: 

Simplify C by Claim 2. 

(c) For every clause containing a repeated variable x\ do: 

If x\ is repeated at least twice in each affinity (i.e, {x\,x\,x\,x\, ...}), a contra- 
diction occurs, and C is decided. Else discard the repetition as follows: 

1 . If Xi appears at least twice in one affinity b and does not appear at all in the other 
affinity, b, then there are two cases to consider. If there are no other variables 
in the clause, a contradiction occurs. Else set X\ to b and remove it from the 
clause. 

2. If xi appears at least twice in one affinity b and exactly once in the other affinity, 
b, then again set x\ to b. Furthermore, apply FixFalse to all other literals in the 
clause and discard the clause. 

3. If xi appears exactly once in each affinity (i.e, {xi,xi,X 2 }), apply FixFalse 
to all other literals in the clause and discard the clause. 

(d) For every two clauses which contain three variables each and share two or more 
common variables do: 

Decide or simplify C by Claims 4 and 5. 




460 



L. Drori and D. Peleg 



(e) For each clause in C which has more than one singleton in it do; 

Simplify C by Claim 12. 

(f) For every two constant variables such that each clause in C either contains both 
variables or contains neither do: 

Simplify C by Claim 10. 

(g) For every two clauses with the same number of variables r in which r — 1 variables 
appear in both clauses with the same affinity do: 

Simplify C by Claim 1 1 . 

(h) For every two clauses where all variables of one appear in the other do: 

Decide or simplify C by Lemma 13. 

EndJVhile 

The output of CannonizeXSAT(C) is of the form (6, C) with the same values as 
defined in X3SAT. Each iteration of the algorithm takes 0{m) time and discards at 
least one variable, and so the algorithm terminates in time 0{mn), hence Lemma 8 still 
applies. 

3.3 A Recursive Algorithm 

Let C be an instance. The recursive algorithm XSAT(C) for finding a satisfying truth as- 
signment r or a contradiction is identical to that of Section 2.3, except that it uses the more 
general cannonization procedure CannonizeXSAT instead of procedure 
CannonizeXSSAT of Section 2.3 and step 2b for choosing a variable is changed to 
picking the smallest clause with no singleton in it, instead of just any clause of Section 
2.3. 

In the full paper we prove the following theorem. 

Theorem 14. The recursive algorithm XSAT(C) has time complexity 0{m ■ 
where n and m are the number of variables and clauses of C, respectively. 



4 An Algorithm for Exact Hitting Set 

The Exact Hitting Set (XHS) problem is defined as follows. Given a Boolean matrix M 
with n rows and m columns, decide whether there exists a subset of the rows, such that 
each column j in M has exactly one row i such that Mij = 1. 

Theorem 15. There exists a recursive algorithm of time complexity 0{m ■ 20•32l2^^^ 
XHS. 

Proof. The proof is by a simple reduction from XHS to XSAT. Given an instance M of 
the XHS problem, construct an instance C of XSAT as follows. Define a variable Xi for 
each row 1 < i < n, and for each column j create a clause Cj = {xi \ Mij = 1}. Note 
that the number of variables in the instance C equals the number of rows in the given 
matrixM.ItisstraightforwardtoverifythatXiTS'(M) = TrueiWXSAT{C) = True. 

I 
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Abstract. We propose a new algorithmie technique for constructing combinato- 
rial designs such as t-designs and packings. The algorithm is based on polyhedral 
theory and employs the well-known branch-and-cut approach. Several properties 
of the designs are studied and used in the design of our algorithm. A polynomial- 
time separation algorithm for clique facets is developed for a class of designs, 
and an isomorph rejection algorithm is employed in pruning tree branches. Our 
implementation is described and experimental results are analysed. 



1 Introduction 

Computational methods have been important in combinatorial design theory. They are 
useful for constructing “starter” designs for recursive constructions for infinite families 

[17] and they also play a central role in applications [2]. Techniques that have been 
widely used include backtracking, several local search methods (such as hill-climbing, 
simulated annealing, genetic algorithms) as well as several algorithms using fA: -matrices 
(see [8,17]). In [11], we propose the general approach of employing polyhedral algo- 
rithms to combinatorial design problems. In the present article, we design and implement 
the first branch-and-cut algorithm for constructing f-designs and packings. 

A t-{v, k, A) design is a pair (U, B) where U is a u-set and B is a collection of k- 
subsets of V called blocks such that every f-subset of V is contained in exactly A blocks 
of B. A f-(u, k, A) packing design is defined by replacing the condition “in exactly A 
blocks” in the above definition by “in at most A blocks”. The packing number, denoted by 
D\{v, k, t), is the maximum number of blocks in a t-{v, k, A) packing design. Important 
classes of f-designs are Steiner systems k, 1) designs) and balanced incomplete 
block designs (2-(u, k, A) designs); designs with k = t + 1 are also of special interest, 
for instance Steiner triple systems and Steiner quadruple systems. Central questions 
in combinatorial design theory are the existence of f-designs and the determination of 
packing numbers. It is well known that if a t-{v, k, A) design exists, it is a maximum 
packing. Thus, our algorithm constructs designs by searching for maximal packings. In 

[18] , a polyhedral algorithm specific for 2-designs is proposed, using a different integer 
programming formulation. 

The main contributions of this paper is the design and implementation of a new 
branch-and-cut algorithm for packings and f-designs. We incorporate into the algorithm 
some aspects specific to combinatorial design problems such as a new clique separation 
based on design properties and an isomorph rejection scheme to remove isomorphic sub- 
problems from the branch-and-cut tree. The effects of various parameters on performance 
are analyzed through experiments. Our method is competitive with other techniques for 

J. Nesetfil (Ed): ESA’99, LNCS 1643, pp. 462^75, 1999. 
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generating designs including backtracking and randomized search. Our algorithm pro- 
duced new maximal cyclic t-{v, k, 1) packings for t = 2, 3, 4, 5, /c = t + l,t + 2 and 
small V. We hope this article will be a starting point for further research in the application 
of similar techniques to combinatorial design problems. 



2 Polyhedra for Designs and Packings 



In this section, we present integer programming formulations for t-designs and their 
extensions to packings. Similar formulations for designs with prescribed automorphism 
groups can be employed (see [11]). 

We concentrate on designs without repeated blocks, known as simple designs. Such a 
design can be represented by an incidence vector, that is a 0-1 vector x G 1r('‘) indexed 
by the /c-subsets of a u-set and such that xs = 1 if and only if S' is a block of the design. 
The polyhedron associated with a design is defined as the convex hull of the incidence 
vectors of all designs of that kind. Let us denote by and Pt,v,k,\ the polyhedra 

associated with the t-{v, k, A) designs and packing designs, respectively. 

Let W^k be the (^) x (^) matrix with rows indexed by the t-subsets and columns 
by /c-subsets of a u-set, and such that [W^k]T,K = 1 if T C iT and [W^k]T,K = 0, 
otherwise. For a detailed study of these matrix and their role in design theory see [4]. 

It is easy to see that t-{v, k, A) designs correspond to the solutions x G 1R('‘) of 



(DP) 



r lP,:,a: = Al, 

I a; G {0, 



The maximum packings correspond to solutions x G 1R('‘) of 

{ maximize 

subject to W^ k X <\1, 

X G {0, l}(fc). 

Based on these integer programming formulations, we rewrite the design polytopes as 

Tt,v,k,\ = conv{x G {0, !}('“) : ^ x = Al} and Pt,v,k,\ = conv{x G {0, !}('“) : 

X < Al}. For A = 1 these are special cases of set partitioning and set packing, 
respectively. For A > 1, the polytope Pt,v,k,\ is the polytope for independent sets 
of an independence system. The general problems of set partitioning, set packing and 
maximum independence systems are known to be NP-hard [7]. 

An independence system is a pair (I, I) where / is an n-set and X is a family of 
subsets of /, with the property that /i C J 2 G X implies I\ G X; the individual members 
of X are called independent sets. Sets J Q I such that J ^ X are said to be dependent 
and minimal such sets are called circuits. An independence system is characterized by 
its family of circuits. We observe that the circuits of our specific independence system 
are the sets of (A + 1) /c-subsets of [1, u] sharing a common t-subset, as we state in the 
following proposition. 
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Proposition 1. (Packings and independence systems) 

A t-{v, k, A) packing is an independent set of the independence system given by the 
circuits 

,Kx+i}:Kiei}^f) and |i^i n i ^2 n . . . n i^A+i| > t}} • 

The following proposition gives basic properties of the design polytopes. 
Proposition 2. (Basic properties - see [11]) 

i. Ifk<v-t then dimTt^v,k,\ < H) ~ (()■ 

ii. Pt,v,k,x is full dimensional. 

Hi. The inequalities Xi > 0, i = 1, . . . , (]^), are facet inducing for Pt^v,k,\- 

Let (J, I) be a p-regular independence system, i.e. one with all circuits with same 
cardinality p. A subset J' C / is said to be a clique if |/^ | > p and all p-subsets of I' are 
circuits of {1,1). 

Theorem 3. (Clique inequalities for independenee systems - Nemhauser and Trotter 
[15]) 

Suppose I' C I is a maximal (with respect to set-inclusion) clique in a p-regular 
independence system S = (/, X). Then, — P~ ^ defines a facet of P{S). 

2.1 Characterization of Clique Inequalities of Pt,v,k,\ and Efficient Separation 

A clique in such a (A + 1) -regular independence system corresponds to a group of 
/c-subsets of [1, u] with the following intersecting properties. 

Definition 4. (s-wise t-intersecting set systems) 

Givens > 2andv,t > 1, afamily Aof subsets of [l,v] is said to be s-wise t-inteisscting, 
if any s members Ai, . . . ,Ag of A are such that | Ai n . . . fl | > t.A family A is said 

to be A: -uniform if every member of A has cardinality k. Let (v, k, t) denote the set of 

all k-uniform s-wise t-intersecting families of subsets of[l,v\. Let MP{v, k,t) denote 
the set of all families in P{v, k, t) that are maximal with respect to set inclusion (i.e. 
A G P{v, k, t) such that for any B G P{v, k, t), if BA A then B = A). 

Proposition 5. (Characterization of generalized cliques) 

Let Abe a family of k-subsets of [1, u]. Then A is a generalized clique for the indepen- 
dence system associated with a t-{v, k, A) packing if and only if A & k, t). 

Moreover, a elique A is maximal if and only if A & (v, k, t). 

Proof. By the definition above of clique in an independence system, A corresponds to a 
clique for the set packing independence system if and only if A is a family of /c-subsets of 
[1, u] such that all subfamilies of A with (A + 1) elements are circuits. By Proposition 1, 
this is equivalent to A G k,t). Clearly, the clique is maximal if and only if 

AeMI^^+^\v,k,t). 

The following corollary characterizes the clique inequalities for the polytope Pt,v,k,\- 
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Corollary 6. (Clique inequalities for Pt^v,k,\) 

Let A G {v, k, t). Then, the inequality 

xk < A 

KeA 

is valid for the polytope Pt,v,k,\- Moreover, the above inequality induces a facet of 
Pt,v,k,x if and only if A ^ k, t). 

Proof. It follows from Proposition 5 and Theorem 3. 

In [13], we classify all pairwise t-intersecting families of /c-sets of a n-set foxk—t < 2 
and all t and v. Some of these results can be translated in terms of clique facets as follows: 



1. For any t> 1, n>t + 3, A = 1 and k = t+l, there exists exactly two distinct (up 
to isomorphism) clique facets for Pt,v,t+i,i, namely 



Xk<^, 

Xk<^, 



for all T e 




for all L G 




( 1 ) 

( 2 ) 



2. For any t> 1, n>t + 6, A=1 and k = t + 2, there exists exactly 15 distinct types 
of clique facets for t = 1, and 17 distinct types of clique facets for t > 2. These 
cliques are given explicitly for t = 1 and t = 2, and by a construction for t > 3 (see 
[13] for their forms). 

3. For any t > l,v>t + 2, X>2 and k = t+1, there exists exactly one distinct (up 

to isomorphism) generalized clique, namely K jt < 1- 

4. For arbitrary k > t and A, there exists a Uq = Vo{t, A, k) such that all clique facets 
for V > Vo are determined by those for vq. 



The knowledge of the clique structure can help us designing separation algorithms. 
For example, for the case of k = t + 1 and A = 1 the separation of clique inequalities 
turns out to be quite efficient, as shown in the following algorithm. 

Let f. = be the intersection graph of W^j^, that is, the graph such 

thatiTi,iT 2 G are linked by an edge if and only if \K\ n K 2 \ > t. 



Algorithm: Separation of clique inequalities for Pt,v,t+i,i 
Input: a fractional solution x to (PDF) 

Output: a violated clique inequality or “There are no violated cliques” 
for every edge {iTi, 7^2} G E 
take L = KiU K 2 

if ^ 

return “Violated clique: ”, L 
return “There are no violated cliques.” 
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Let us specify our measure of complexity. The size of a design problem is measured 
by the number of bits needed for its integer programming formulation. For t-(u, A;, A) 
designs, the problem size is exactly (") x + (^) log A. Our measure of complexity 
is the number of basic operations such as arithmetic operations and comparisons. 

Corollary 7. (Separation of cliques in 

The clique facets in can be separated in polynomial time. 

Proof. The statement is implied by the correctness of the previous algorithm. Any frac- 
tional solution X to (PDF) must satisfy < 1. So, inequalities (1) are satisfied 

and the only possible violated cliques are the ones in (2). Note that there is exactly 
one clique of type (2) passing through each edge {Ki,K 2 } in the graph for 

\Ki U K 2 1 = t + 2. This shows the correctness of the algorithm. The polynomiality can 
be checked by noticing that every iteration takes polynomial number of steps and the 
number of iterations is at most the square of the number of variables in the problem. 

3 A Branch-and-Cut Algorithm for Packings and Designs 

Besides being an alternative to tackle combinatorial design problems, the branch-and-cut 
approach offers other advantages. We can adapt the general framework in order to deal 
with design specific issues such as: fixing subdesigns, extending designs, forbidding 
sub-configurations, proving non-existence of designs and assuming the action of an 
automorphism group (see [12] for details). 

Our implementation handles t-{v, k, 1) designs and packings, both ordinary ones 
and admitting cyclic automorphism groups. In the following, we describe subalgorithms 
and other issues specific to our implementation. The reader is referred to [1] for the 
general branch-and-cut approach. 

3.1 Initialization 

Some variable fixing can be done, in the original problem of finding f-designs, before 
running the branch-and-cut algorithm. For any t-{v, k, 1) design and any given {t — 1)- 
subset S of [1, u], the blocks of the design that contain S are unique up to permutations. 
Indeed, we can assume w.l.o.g. that the following blocks are present in the design Bi = 
[\,t - 1] U [f + (f - l){k - t + l),t + i{k - t + 1) - 1], 1 < i < E = 

[1, f — 2] U Ui<i<fe-(t- 2 ) {t + {i — ^){k — t + 1)}, and that all the other subsets of 
[1, u] containing [1, t] are not present in the design. 

3.2 Separation Algorithms 

The fractional intersecting graph: The separation of clique facets relies on finding 
violated cliques in the intersection graph of the original matrix. It is well-known [5] that 
in any set packing problem, we can restrict our attention to the fractional intersection 
graph, i.e., the subgraph of the intersection graph induced by the fractional variables. 
In our experiments, this reduces the size of the graph we have to deal with from several 
thousand to a few hundred nodes. 
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Separation of clique facets: Given x, we must find violated eliques in the fraetional 
graph (eorresponding to x). A elique C is eonsidered violated if Xi~ 1 > 

VIOLATION-TOLERANCE. Previously generated inequalities are stored in a pool of cuts 
in order to reuse them in other nodes of the tree. This is a eommon feature in a braneh- 
and-eut algorithm (see [1]). 

GENERAL-CLIQUE-SEPARATION: The general clique detection employed by our algo- 
rithm works for a general graph. We borrowed several ideas from Hoffman and Padberg 
[5] and Nemhauser and Sigismondi [14]. For every node v, we search for a violated cli- 
que containing v. If the neighborhood of v is small, say under 16 nodes, we enumerate 
every clique in the neighborhood; otherwise, we use two greedy heuristics proposed in 
[14]. 

SPECIAL-CLIQUE-SEPARATIQN (for k=t-l-l): We implemented the special clique sepa- 
ration algorithm for designs with k = t+1, given on page 465 . Recall that this algorithm 
uses the knowledge of the clique structure for these problems and examines each edge 
of the graph exactly once, since there is at most one violated clique passing through each 
edge. Note that we use the fractional intersection graph in place of as mentioned 
previously. 

Criteria for abandoning the cutting-plane algorithm: Our algorithm stops cutting- 
plane iterations if any one of the following conditions is satisfied: the optimal solution 
to the LP relaxation is integral (i.e. the subproblem rooted at the node was solved), the 
subproblem is infeasible, or the addition of the cuts is “not producing much improve- 
menf ’. The last condition is measured by the number of cuts and the “quality” of cuts at 
the previous iteration. The cutting-plane algorithm is abandoned whenever the number 
of cuts in the previous iteration is smaller than a parameter MIN-NUMBER-OF-CUTS or 
the maximum violation is smaller than a parameter MIN-WQRTHWHILE-VIOLATIQN. 

3.3 Partial Isomorph Rejection 

For combinatorial design problems, several subproblems in the branch-and-cut tree may 
be equivalent. Recall that a node in the branch-and-cut tree corresponds to the subproblem 
in which the variables in the path from the root to the node have their values fixed either 
to zero or one. Let N be any node of the tree and denote by J-'o(A^) and the 

collection of blocks corresponding to the variables fixed to 0 and 1, respectively, in the 
path from the root ofthe tree to N. If A^andM are nodes in the tree with 
isomorphic to , IFi {M)) then equivalent problems are going to be unnecessarily 

solved. The partial isomorph rejection we describe in this section aims at reducing the 
number of such equivalent subproblems. 

Let be a node of the branch-and-cut tree with two children Nq and . The original 

branching scheme would make Nq correspond to xk = 0 and Ni correspond to xk = 1, 
for some variable K. This branching scheme is modified so that the number of nodes in 
the tree is reduced by avoiding some subproblems that are equivalent to others already 
considered, as we discuss next. Let A be the permutation group acting on [1, u] that fixes 
Tq{N) and {N), and let A{K) be the orbit of K under A. The new branching scheme 
for partial isomorph rejection lets Nq correspond to “xl = 0 for all L € A{K)” and 

correspond to “xl = 1 for some L € A{K)”. The tree reduction comes from letting 
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Ni correspond, w.l.o.g., to “xk = 1” instead. Thus the new branehing scheme implies 

^o(A^o) = ^o{N) U A{K), T,{No) = 

To{N,) = J^o(N), Ti{N,) = J^,{N)u{K}. 

This new branch-and-cut tree has at most as many problems as the regular one, since 
when |j4(iT)| > 1 not only iT but other variables in yl(iT) are being simultaneously fixed 
at 0 in A^o- All we need is an algorithm that, given colleetions Tq and of /c-subsets of 
[1, u] and a /c-subset K of [1, u], computes: (1) the permutation group A acting on [1, u] 
that fixes and and (2) the orbit A[K) of K under A. 

The first problem is equivalent to finding the permutation group acting on the vertices 
of a special graph that fixes some subsets of the vertices. Consider the bipartite graph 
G[i,v],ToUTx whose vertex partition corresponds to points in [1, u] and sets in JFq U Ju, 
and sueh that p e [1, n] is conneeted to F e JFq U JCi if and only if p e F. Thus, our 
problem is equivalent to finding the permutation group acting on the vertices of the graph 
that fixes vertices in [1, u], in Fq and in This ean be computed using the package 
Nauty [9], by Brendan McKay, the “most powerful general purpose graph isomorphism 
program eurrently available” [6]. 

The seeond problem can be solved by a simple algorithm that we deseribe now. 
A colleetion S of k-sets is initialized with K. At every step, a different set L in <5 is 
considered and, for all tt G A, the set 7 t(L) is added to S. The algorithm halts when all 
sets in S have been eonsidered, and thus, A{K) = S. 

A small variation of the two previous methods can dramatically improve efficiency 
when \Fq U Fi| <C u. Let R = UggjFouJ^i'S. Consider the graph instead of 

G[i^v],ToyjTi ^nd apply the method described above to compute an automorphism group 
A' . The points in [1, v]\R are isolated vertices in ^nd therefore form a 

cycle in any permutation in A. Thus A' is the restriction of A to points in R. In order to 
compute A{K), we use the method described above and compute A'{K n R), and then 
compute A{K) by taking all the /c-subsets of [1, u] that eontain some set in A'{K n R). 
Two kinds of improvements in efficiency are observed. In the first part, the original graph 
gets redueed by [1, u] \R nodes. In the second part, if A n i?| < A: the size of the set 
S in the seeond problem is redueed by a factor of 

3.4 Branch-and-Cut Tree Processing 

We implemented two strategies for the seleetion of the branching variable in a node. 
The first one selects the variable with largest fractional value, and the second one, the 
variable elosest to 0.5. Both strategies turned out to be equivalent, sinee in our problems 
most of the fractional variables are smaller than 0.5. Our selection of the next node to 
process is done as a depth-first search, giving priority to nodes with variables fixed to 1 . 
To help in the detection of a globally optimal solution, we use the general upper bound 
on the size of a packing given by the Schonheim bounds (see [10]). 

4 Computational Results 

In this seetion, we report on eomputational experiments with the braneh-and-cut imple- 
mentation described in the previous section. The tests are run on a Sun Ultra 2 Model 
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2170 workstation with 245 MB main memory and 1.2 GB virtual memory, operating 
system SunOS 5.5.1 . Our branch-and-eut consists of about 10,000 lines of code written 
in C++ language and compiled with g++ compiler. The following packages are linked 
with our code: LEDA Library version 3.2.3 [16] for basic data structures such as lists and 
graphs, CPLEX package version 4.0.8 [3] for solving linear programming subproblems, 
and Nauty package version 2.0 [9] for finding automorphism groups of graphs. 

Our main conclusions are outlined as follows. It is advantageous to run the isomorph 
rejection algorithm. In Table 2 we report on isomorph rejection statistics. We compare 
the same instances of packings with and without isomorph rej ection (specified by column 
(IRej)). Only the packings with v = 5, 11, 12, 14 require a call to the algorithm. From 
these parameters, only u = 5, 11 benefit from nontrivial orbits, and only v = 11 profits 
from the isomorph rejection. We observe that the time spent in the isomorph rejection 
algorithm is very small compared to the total time. The packing for v = 11 could not be 
found without the isomorph rejection. The difficulty encountered for u = 11 is that the 
Schonheim upper bound is not met by the packing size. Therefore, the program might 
find a solution of size 17, but has to go over most of the branches to conclude it is 
optimal. The isomorph rejection reduces the amount of branches to be searched for. We 
conclude that the isomorph rejection algorithm is effective since it spends little extra 
time and adds the benefit of tackling the hardest problems. 

The impact of cutting is analysed through a comparison between branch-and-eut 
with a straight forward branch-and-bound. We observed in our experiments that for 
designs there was no clear winner in terms of total time. For packings, the total time 
using cuts is either comparable or substantially smaller than without cuts, especially for 
the larger instances (Table 3). In all cases, using cuts reduces the number of explored tree 
nodes and the number of times the algorithm backtracks. This is reflected in the often 
smaller number of solved linear programming problems and time spend on solving 
them. However, the time spent on cut separation makes the cutting version worst for 
some instances. Larger problems should benefit from the cutting version, since linear 
programming tends to dominate the running time. 

The specialized separation is also more efficient in practice than the general separa- 
tion heuristic for clique facets. Table 5 summarizes the results of 8 runs corresponding to 
a combination of parameters described in Table 4. Even and odd numbered runs corre- 
spond to specialized and general separation algorithms, respectively. From these tables 
we observe that the specialized separation is done much faster than the general one (see 
column (ST)). In all runs the specialized separation produced savings in the total running 
time of up to 50%. 

Parameters that affect the trade-off between branching and cutting, are also analysed 
in Table 5. This table shows the influence of the parameters MIN-WORTHWHILE- 
VIOLATION and VIOLATION-TOLERANCE. Let us denote them by MWV and V, 
respectively. An interval [vmin, Vmax] is assigned to each of these parameters. The algo- 
rithm initially sets the value of the parameter to Vmax ; as the number of fractional varia- 
bles decreases, the parameter is eontinuously reduced towards Vmin- The 4 combinations 
shown in Table 4 are tried. The best runs involve (MWV, V) = [(0.3, 0.3), (0.3, 0.3)] and 
[(0.3, 0.6), (0.3, 0.6)]. The main eonelusionis that the performance is positively affected 
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by requiring stronger cuts. In all other tables, these parameters are set as MWV = V = 
[(0.3, 0.6)]. 

We also did a comparison of our branch-and-cut with a leading general purpose 
branch-and-bound software (Cplex Mixed Integer Programming Optimizer), as a refe- 
rence (see Table 5, column run#=Cplex). As problem sizes grow, our algorithm runs 
much faster than CPLEX (2 to 10 times faster in the two largest problems). 

Some difficult problems given by Steiner quadruple systems are shown in Table 6. 
The case u = 14 is already a hard instance. In earlier versions of this implementation it 
took about 6 hs to solve this problem. Currently, it takes about 40 minutes to solve it. The 
next instance u = 16 is still a challenge for this implementation. Although the design 
is known to exist, other computational methods also fail to find this design (Mathon, 
personal communication). 

Finally, we summarize our findings on cyclic packings. Recall that these problems are 
solved by assuming a cyclic automorphism group action on the design, which produces 
reductions of the problem size. Tables 7 and 8 compare the size of maximal t-{v, k, 1) 
packings to the size of regular packings for t = 2, 3, 4, 5, A: = t + 1, t + 2, and small v. 
Columns (Bl) and (C) indicate the number of base blocks and the total number of blocks 
in the cyclic packings, respectively. To the best of our knowledge, this is the first time 
this quantities are computed. In columns (D) and (S) we include known values for the 
size of a maximal ordinary packing and Schonheim upper bounds, respectively. Thus, 
we must have C < D < S . Observe that in most cases C is not much smaller than D (or 
not much smaller than S in the cases that D is unknown, see Table 8). Our experiments 
show that the sizes of cyclic packings are close to maximal ordinary packings (compare 
(C) and (D)), and they are much easier to compute and compact to store. Therefore, these 
objects should be very attractive for applications. The size of a maximal cyclic packing 
also offers a good lower bound for the packing number. 
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Appendix 



Table 1. Problem sizes and statisties 



t-(v,k,l) 


1 ordinary | 


1 cyclic packings] 


packings 


rows 


cols 


rows 


cols 


2-(5,3,l) 


10 


10 


2 


0 


2-(6,3,l) 


15 


20 


3 


1 


2-(7,3,l) 


21 


35 


3 


2 


2-(8,3,l) 


28 


56 


4 


2 


2-(9,3,l) 


36 


84 


4 


7 


2-(10,3,l) 


45 


120 


5 


4 


2-(l 1,3,1) 


55 


165 


5 


10 


2-(12,3,l) 


66 


220 


6 


11 


2-(13,3,l) 


78 


286 


6 


16 


2-(14,3,l) 


91 


364 


7 


14 


3-(6,4,l) 


20 


15 


4 


1 


3-(7,4,l) 


35 


35 


5 


2 


3-(8,4,l) 


58 


70 


7 


8 


3-(9,4,l) 


84 


126 


10 


9 


3-(10,4,l) 


120 


210 


12 


19 


3-(l 1,4,1) 


165 


330 


15 


25 


3-(12,4,l) 


220 


495 


19 


37 


3-(13,4,l) 


286 


715 


22 


49 


3-(14,4,l) 


364 


1001 


26 


67 


3-(15,4,l) 


455 


1365 


31 


82 


3-(16,4,l) 


560 


1820 


35 


no 


3-(17,4,l) 


680 


2380 


40 


132 


3-(18,4,l) 


816 


3060 


46 


160 


3-(19,4,l) 


969 


3876 


51 


195 


3-(20,4,l) 


1140 


4845 


57 


238 


3-(21,4,l) 


1330 


5985 


64 


270 


3-(22,4,l) 


1540 


7315 


70 


325 


4-(8,5,l) 


70 


56 


10 


2 


4-(9,5,l) 


126 


126 


14 


9 


4-(10,5,l) 


210 


252 


22 


18 


4-(l 1,5,1) 


330 


462 


30 


37 


4-(12,5,l) 


495 


792 


43 


52 


4-(13,5,1) 


715 


1287 


55 


93 


4-(14,5,1) 


1001 


2002 


73 


122 







1 ordinary | 


cyclic 


t-{v^ fc, 1) 




before 


fixing 


after fixing | 




design 


b 


rows 


cols 


rows 


cols 


rows 


cols 


2-(7,3,l) 


7 


21 


35 


15 


21 


3 


2 


2-(9,3,l) 


12 


36 


84 


28 


55 


4 


7 


2-(13,3,l) 


26 


78 


286 


66 


219 


6 


16 


2-(15,3,l) 


35 


105 


445 


91 


363 


7 


25 


2-(19,3,l) 


57 


171 


969 


153 


815 


9 


42 


2-(21,3,l) 


70 


210 


1330 


190 


1139 


10 


55 


2-(25,3,l) 


100 


300 


2300 


286 


2023 


12 


80 


2-(27,3,l) 


117 


351 


2925 


325 


2599 


13 


97 


2-(31,3,l) 


155 


495 


4495 


465 


4059 


15 


130 


2-(33,3,l) 


176 


528 


5456 


496 


4959 


16 


151 


3-(8,4,l) 


14 


56 


70 


50 


54 


7 


8 


3-(10,4,l) 


30 


120 


210 


112 


181 


12 


19 


3-(14,4,l) 


91 


364 


1001 


352 


934 


26 


67 


3-(16,4,l) 


140 


560 


1820 


546 


1728 


35 


no 



b: number of blocks 

Seh: Schonheim bound 

IRej: isomorph rejection algorithm 

IE: number of calls to IRej 

IR: number of successful IRej ’s 

Mxl: Max. depth of node in successful IRej 

IT: Total time in IRej 



BB: number of explored B&B nodes 
BT : number of backtracks 
LPT: total time solving LP problems 
LPs: number of LP problems 
ST: total time in separation alg. 

Cl: total number of added eliques 

TotT: total time 
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Table 2. The effect of partial isomorph rejection on 2-{v, 3, 1) packings 



V 


b 


Sch 


IRej 


BB 


BT 


IE 


IR Mxl 


IT 


LPT LPs 


ST 


Cl 


TotT 




2 


3 


yes 


3 


1 


1 




0 


0 


0 


3 


0 


0 


0.01 


5 


2 


3 


no 


7 


3 


0 


0 


- 


0 


0.02 


8 


0 


1 


0.02 


6 


4 


4 


yes 


3 


0 


0 


0 


- 


0 


0 


3 


0.01 


0 


0.01 


6 


4 


4 


no 


3 


0 


0 


0 


- 


0 


0.01 


3 


0 


0 


0.01 


7 


7 


7 


yes 


1 


0 


0 


0 


- 


0 


0.01 


1 


0 


0 


0.02 


7 


7 


7 


no 


1 


0 


0 


0 


- 


0 


0.01 


1 


0 


0 


0.02 


8 


8 


8 


yes 


5 


0 


0 


0 


- 


0 


0.02 


8 


0.02 


6 


0.06 


8 


8 


8 


no 


5 


0 


0 


0 


- 


0 


0.02 


8 


0.02 


6 


0.05 


9 


12 


12 


yes 


2 


0 


0 


0 


- 


0 


0.02 


3 


0 


1 


0.04 


9 


12 


12 


no 


2 


0 


0 


0 


- 


0 


0.02 


3 


0.02 


1 


0.05 


10 


13 


13 


yes 


11 


0 


0 


0 


- 


0 


o 

b 

00 


14 


0.05 


4 


0.14 


10 


13 


13 


no 


11 


0 


0 


0 


- 


0 


0.05 


14 


0.08 


4 


0.13 


11 


17 


18 


yes 


649 


324 


324 


11 


10 


0.26 


6.41 


829 


2.46 


319 


9.94 


11 


17 


18 


no 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


(*) 


12 20 


20 


yes 


17 


1 


1 


0 


- 


0 


0.22 


24 


0.25 


11 


0.53 


12 20 


20 


no 


17 


1 


0 


0 


- 


0 


0.24 


24 


0.27 


11 


0.53 


13 


26 


26 


yes 


13 


0 


0 


0 


- 


0 


0.12 


14 


0.34 


1 


0.48 


13 


26 


26 


no 


13 


0 


0 


0 


- 


0 


0.13 


14 


0.33 


1 


0.49 


14 27 


28 


yes 


77 


27 


27 


0 


- 


0.05 


1.82 


121 


0.85 


86 


2.91 


14 27 


28 


no 


77 


27 


0 


0 


- 


0 


1.81 


121 


0.88 


86 


2.93 



(*) the algorithm failed to find the designs even after exploring 400,000 branches. 



Table 3. Branch-and-cut versus branch-and-bound for 2-{v, 3, 1) packings 



V 


b 


Cuts 


BB 


BT 


IE 


IR Mxl 


IT 


LPT 


LPs 


ST 


Cl 


TotT 


T" 


2 


yes 


3 


1 


1 


1 


0 


0 


0 


3 


0 


0 


0.01 


5 


2 


no 


3 


1 


1 


1 


0 


0 


0 


3 


0 


0 


0.01 


6 


4 


yes 


3 


0 


0 


0 


- 


0 


0 


3 


0.01 


0 


0.01 


6 


4 


no 


3 


0 


0 


0 


- 


0 


0.01 


3 


0 


0 


0.01 


7 


7 


yes 


1 


0 


0 


0 


- 


0 


0.01 


1 


0 


0 


0.02 


7 


7 


no 


1 


0 


0 


0 


- 


0 


0.01 


1 


0 


0 


0.02 


8 


8 


yes 


5 


0 


0 


0 


- 


0 


0.02 


8 


(N 

o 

b 


6 


o 

b 

c^ 


8 


8 


no 


8 


1 


1 


0 


- 


0 


0.04 


8 


0 


0 


0.05 


9 


12 


yes 


2 


0 


0 


0 


- 


0 


0.02 


3 


0 


1 


0.04 


9 


12 


no 


4 


0 


0 


0 


- 


0 


0.02 


4 


0 


0 


0.02 


10 


13 


yes 


11 


0 


0 


0 


- 


0 


o 

b 

00 


14 


0.05 


4 


0.14 


10 


13 


no 


13 


0 


0 


0 


- 


0 


0.06 


13 


0 


0 


0.08 


11 


17 


yes 


1 649 3241 


324 


11 


10 0.261 


6.41 


829 


2.46 


319 


9.94 


11 


17 


no 


1063 


531 


531 


17 


10 


0.33 


7.91 


1063 


0 


0 


9.72 


12 20 


yes 


17 


1 


1 


0 


- 


0 


0.22 


24 


0.25 


11 


0.53 


12 20 


no 


00 

00 


414 


0 


- 


1.75 


5.69 


848 


0 


0 


8.97 


13 


26 


yes 


13 


0 


0 


0 


- 


0 


0.12 


14 


0.34 


1 


0.48 


13 


26 


no 


44 


14 


14 


0 


- 


0 


0.34 


44 


0 


0 


0.46 


14 28 


yes 


77 


27 


27 


0 


- 


0.05 


1.82 


121 


0.85 


86 


2.91 


14 28 


no 


314 


145 


145 


0 


- 


0.15 


3.57 


314 


0 


0 


4.71 
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Table 4. Parameter combination for several runs. 





clique separation algorithm 


V MWV 


general specialized 


[0.1, 0.1] 10.1,0.3] 
[0.1, 0.1] [0.3, 0.3] 
[0.3, 0.3] [0.3, 0.3] 
[0.3, 0.6] [0.3, 0.6] 


run# 1 run# 2 

run# 3 run# 4 

run# 5 run# 6 

run# 7 run# 8 



Table 5. Clique separation and parameter variations for 2-{v, 3, 1) designs 



V 


b 


run# 


BB 


BT 


LPT 


LPs 


ST 


Cl 


TotT 


25 


100 


1 


71 


T 


23.1 


110 


92.21 


91 


115.76 


25 


100 


3 


71 


4 


23.5 


110 


92.21 


93 


116.28 


25 


100 


5 


64 


0 


11.62 


81 


79.51 


30 


91.38 


25 


100 


7 


57 


0 


12.49 


65 


72.7 


9 


85.42 


25 


100 


2 


60 


1 


26.37 


98 


20.28 


97 


46.94 


25 


100 


4 


60 


1 


25.75 


98 


19.96 


97 


45.99 


25 


100 


6 


66 


0 


17.11 


91 


18.49 


42 


35.78 


25 


100 


8 


60 


0 


9.966 


69 


17.27 


12 


27.44 


25 


100 


Cplex 


1 97 1 




1 1| 


63.83 


tT 


117 


1 


75 


3 


50.46 


124 


156.84 


121 


207.89 


21 


117 


3 


74 


3 


42.84 


121 


153.92 


118 


197.35 


21 


117 


5 


102 


18 


74.5 


144 


146.66 


81 


222.79 


21 


117 


7 


76 


4 


32.89 


89 


132.96 


16 


166.52 


21 


117 


2 


71 


2 


41.69 


112 


33.66 


109 


75.86 


21 


117 


4 


71 


2 


43.31 


112 


33.75 


109 


77.55 


21 


117 


6 


99 


14 


48.23 


137 


32.58 


63 


82.29 


21 


117 


8 


76 


4 


32.61 


89 


29.89 


16 


63.09 


21 


117 


Cplex 


1 230 1 




1 6| 


340.48 


3f 


155 


1 


135 


13 


165.04 


200 


368.27 


155 


536.09 


31 


155 


3 


135 


13 


171.42 


200 


375.35 


155 


549.59 


31 


155 


5 


119 


1 


98.6 


160 


329.61 


56 


429.22 


31 


155 


7 


117 


9 


101.22 


132 


318.37 


24 


421.60 


31 


155 


2 


112 


0 


122.77 


166 


85.07 


119 


208.57 


31 


155 


4 


112 


0 


126.88 


166 


82.92 


119 


210.55 


31 


155 


6 


112 


0 


70.42 


145 


79.29 


44 


150.42 


31 


155 


8 


113 


7 


87.61 


127 


76.35 


25 


165.56 


31 


155 


Cplex 


|2538 1 




1 34| 


3879.00 


3T 


176 


1 


141 


11 


381.24 


215 


526.76 


181 


911.19 


33 


176 


3 


141 


11 


382.21 


215 


537.29 


181 


922.63 


33 


176 


5 


219 


52 


768.50 


295 


508.96 


170 


1288.51 


33 


176 


7 


257 


67 


599.14 


295 


509 


64 


1122.38 


33 


176 


2 


118 


5 


255.11 


182 


122.8 


152 


379.71 


33 


176 


4 


118 


5 


252.65 


182 


133.36 


152 


387.85 


33 


176 


6 


123 


0 


106.33 


155 


112.80 


42 


220.02 


33 


176 


8 


174 


26 


309.52 


198 


114.61 


53 


430.36 


33 


176 


Cplex 


1 1267 1 




1 14| 


2651.92 
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Table 6. Steiner Quadruple Systems: 3-(t;, 4, 1) designs 



V b 


BB BT 


IE IR Mxl IT 


LPT 


LPs 


ST Cl 


TotT 


8 14 


1 0 


0 


0 


- 0 


0 


1 


0 0 


0.01 


10 30 


1 0 


0 


0 


- 0 


0.07 


1 


0 0 


0.07 


14 91 


1700 838 


0 


0 


- 0 


1974.75 


1755 


35.59 57 


2488.7 



Table 7. Cyclic 2-{v, k, l)/3-(t;, k, 1) packings 
t = 2 t = 3 







k - 


= 3 






k = 


= 4 




V 


Bl 


~C 


D 


S 


Bl 


“C 


D 


S 


“8 


1 


8 


8 


8 


1 


2 


2 


4 


9 


1 


9 


12 


12 


0 


0 


3 


4 


10 


1 


10 


13 


13 


0 


0 


5 


7 


11 


1 


11 


17 


18 


0 


0 


6 


8 


12 


2 


16 


20 


20 


1 


3 


6 


9 


13 


2 


26 


26 


26 


1 


13 


13 


13 


14 


1 


14 


28 


28 


1 


14 


14 


14 


15 


3 


35 


35 


35 


1 


15 


15 


15 


16 


2 


32 


37 


37 


1 


16 


20 


20 


17 


2 


34 


44 


45 


1 


17 


20 


21 


18 


3 


42 


48 


48 


1 


18 


22 


22 


19 


3 


57 


57 


57 


1 


19 


25 


28 


20 


2 


40 


60 


60 


2 


25 


30 


30 


21 


4 


70 


70 


70 


1 


21 


31 


31 


22 


3 


66 


73 


73 


1 


22 


37 


38 


23 


3 


69 


83 


84 


1 


23 


40 


40 


24 


4 


80 


88 


00 
00 1 


2 


30 


42 


42 







k : 


= 4 






k ■■ 


= t 


) 


V 


Bl 


c 


D 


s 


Bl 


~C 


D 


s 


“8 


1 


10 


14 


14 


0 


0 


2 


4 


9 


1 


9 


18 


18 


0 


0 


3 


7 


10 


3 


30 


30 


30 


1 


2 


6 


8 


11 


3 


33 


35 


35 


1 


11 


11 


15 


12 


5 


45 


51 


54 


1 


12 


12 


19 


13 


4 


52 


65 


65 


1 


13 


18 


23 


14 


6 


84 


91 


91 


1 


14 


28 


36 


15 


7 


105 


105 


105 


3 


33 


42 


42 


16 


10 


132 


140 


140 


2 


32 


48 


48 


17 


9 


153 


156 


157 


4 


68 


68 


68 


18 


11 


198 


198 


202 


- 


- 


? 


75 


19 


12 


228 


228 


228 


- 


- 


7 


83 


20 


15 


285 


285 


285 


- 


- 


7 


112 


21 


15 


315 


315 


315 


- 


- 


7 


126 



Table 8. Cyclic 4-(w, fc, l)/5-(t;, k, 1) packings 
t = 4 t = 5 







k = 


6 






k 


= 7 1 


V 


Bl 


C 


D 


s 


Bl 


~C 


D 


s 


“8 


1 


4 


4 


6 


0 


0 


1 


1 


9 


1 


9 


12 


16 


0 


0 


1 


1 


10 


3 


30 


30 


41 


0 


0 


3 


8 


11 


6 


66 


66 


66 


0 


0 


6 


17 


12 


10 


no 


132 


132 


1 


12 


12 


24 


13 


12 


156 


7 


182 


2 


26 


26 


55 


14 


(*) 


> 273 


7 


326 


3 


30 


42 


82 


15 


(*) 


> 370 


7 


455 


- 


- 


7 


113 







k = 


= 5 






k = 


= 6 




V 


Bl 


C 


D 


s 


Bl 


“c 


D 


S 


“8 


1 


8 


8 


11 


0 


0 


1 


1 


9 


1 


9 


18 


25 


1 


3 


3 


6 


10 


3 


30 


36 


36 


0 


0 


5 


11 


11 


6 


66 


66 


66 


1 


11 


11 


14 


12 


6 


72 


7 


84 


2 


14 


22 


30 


13 


9 


117 


7 


140 


2 


26 


26 


41 


14 


11 


154 


7 


182 


3 


42 


42 


53 



Bl: base blocks in the cyclic packing C: size of maximal cyclic packing 
D: size of maximal packing S: Schonheim upper bound 

indicates our algorithm did not find the cyclic packings 
? indicates that the regular packing number is unknown 

(*) indicates that a (not necessarily optimal) cyclic packing was found by the algorithm 
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Abstract. Two new lattice reduction algorithms are presented and analyzed. 

These algorithms, called the Schmidt reduction and the Gram reduction, are ob- 
tained by relaxing some of the constraints of the classical FFF algorithm. By 
analyzing the worst case behavior and the average case behavior in a tractable 
model, we prove that the new algorithms still produce “good” reduced basis while 
requiring fewer iterations on average. In addition, we provide empirical tests on 
random lattices coming from applications, that confirm our theoretical results ab- 
out the relative behavior of the different reduction algorithms. 

1 Introduction 

A Euclidean lattice is the set of all integer linear combinations of p linearly independent 
vectors in IR"^. The vector space is then called the ambient space. Any lattice can 
be generated by many bases (all of them of cardinality p). The lattice basis reduction 
problem aims to find bases with good Euclidean properties, that is sufficiently short 
vectors and almost orthogonal. The problem is old and there exist numerous notions of 
reduction; the most natural ones are due to Minkowski or to Korkhine-Zolotarev. For a 
general survey, see for example [7,18]. Both of these reduction processes are “strong”, 
since they build reduced bases with somehow best Euclidean properties. However, they 
are also computationally hard to find, since they demand that the first vector of the basis 
should be a shortest one in the lattice. It appears that finding such an element in a lattice 
is likely to be NP-hard [1,20]. 

Fortunately, even approximate answers to the reduction problem have numerous theo- 
retical and practical applications in computational number theory and cryptography; 
Factoring polynomials with rational coefficients [1 1], finding linear Diophantine appro- 
ximations [9], breaking various cryptosystems [10, 14], [19] and integer linear program- 
ming [6,12]. In 1982, Lenstra, Lenstra and Lovasz [11] gave a powerful approximation 
reduction algorithm. It depends on a real approximation parameter t €] 1, 2[ and is called 
LLL(f). It begins with the Gram-Schmidt orthogonalizing process, then it aims to ensure, 
for each index i,l < i < p — l,a lower bound on the ratio between the lengths G and 
fi+i of two successive orthogonalized vectors, 

— >s (1) 

So, for reducing an n-dimensional lattice, it performs at least n — 1 iterations. This 
celebrated algorithm seems difficult to analyze precisely, both in the worst-case and in 
average-case. The original paper [11] gives an upper bound for the number of iterations 
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of LLL(t), which is polynomial in the data size. When given p input vectors of IR"^ of 
length at most M, the data size is 0{np log M) and the upper bound is p^ logj M + p. 
Daude and Vallee [4] exhibited an upper bound for the average number of iterations (in 
a simple natural model) which asymptotically equals (p^/2) log^ n + p. 

There is already a wide number of variations around the LLL algorithm (due for instance 
to Kannan or Schnorr [6,13]) whose goal is to find lattice bases with sharper Euclidean 
properties than the original LLL algorithm. 

Here, we choose the other direction, and we present two new variations around the 
LLL-reduction that are a priori weaker than the usual LLL reduction. They are called 
Schmidt-reduction and Gram-reduction. As for the LLL-reduction, they depend both on 
the parameter s. The Gram reduction also depends on another parameter 7. When 7 = 0, 
the Gram-reduction coincides with the LLL reduction. Our algorithms are modifications 
of the LLL algorithm; they have exactly the same structure but they are based on different 
and weaker tests on the ratio between the lengths of orthogonalized vectors. Our purpose 
is twofold. On one hand, we propose more time-efficient reductions for lattices of high 
dimension. Although the new reduced bases are less sharp, they can play the same role 
as the Lovasz-reduced ones in most of the applications, and they are obtained faster. On 
the other hand, the new algorithms are easier to analyze precisely so that the randomness 
and efficiency issues are much better understood. 

Plan of the paper. In Section 2, we define the new reductions and we compare them to the 
LLL reduction: We give the Euclidean qualities of any reduced basis and the worst-case 
complexity of the reduction algorithms. Section 3 presents the main tools of the average- 
analysis in a tractable model. In Section 4, we show a general threshold phenomenon 
for the ratios of lengths of two different orthogonalized vectors and we compare the 
different reduction processes on random lattices. In Section 5, we report empirical tests 
on random lattices coming from applications: the new reduction algorithms remain more 
time-efficient and their outputs are still strong enough to be useful in applications. 
Summary of results. 

(а) Lirst, we show that our reduced bases have always Euclidean properties similar 
to the LLL reduced ones. In particular, the shortest vector is at most (1/s)"^^^ times 
longer than a shortest element in the lattice. Observe that most of applications use only 
the first vector of the reduced basis. 

(б) For the worst-case number of iterations, we show for all the reduction algorithms, 
the same upper bound as for the LLL algorithm. So, we cannot distinguish between these 
algorithms by a worst-case analysis. 

Then, we compare these reductions by means of average-case analysis. We adopt a 
tractable probabilistic model, where the p vectors of the input basis are chosen uniformly 
and independently in the unit ball of IR". For the average analysis, we use various tools: 
We begin with the result due to Daude and Vallee [4] about the distribution function 
of the length ia of the a-th orthogonalized vector associated with random bases. Then, 
we generalize a method due to Laplace to the two-dimensional-case and we apply this 
machinery to study the distribution function of the ratio ^h|^a between the lengths of 
two different orthogonalized vectors. More precisely, we choose for a and b two affine 
functions of the dimension n of the ambient space: 
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Definition 1. For 0 any real constant in [0, 1], and r any integer constant, the quantity 
f{n) := On + r is ealled an affine index (or simply an index^ iff it is an element of 
{1, n}. Moreover, such an index is called a beginning index iff the slope 0 satisfies 
0 <1. It is called an ending index iff the slope 0 satisfies 0 ~ 1. 

We consider the asymptotics of the distribution function of the ratio Ib /i'a for n — ^ oo 
and when a := an + i, b := (3n + j are two indexes. By “almost surely”, we mean 
that the probability tends exponentially to 1 with the dimension n. We exhibit some 
quite different phenomena according to the position of the pair (a, b) with respect to the 
dimension n of the ambient space IR"". 

(c) For a pair (a, 6) of beginning indexes, the distribution fonction of tbfla presents 
a threshold phenomenon that is of independent interest. More precisely, given two real 
constants a and /? in [0, 1[, and two integer constants i and j, the probability 

follows a 0-1 law when n tends to infinity and the jump happens when v equals fl — (I / 
Vi — a. Then, for any fixed s, we exhibit u;o(s) < 1 such that, when the ambient space 
is of sufficiently high dimension n, any random input of dimension p := um with 
a; < tuo (s) is almost surely reduced after p—1 iterations (in the sense of all the previous 
reductions). Furthermore, we show that the new algorithms, are quite efficient, even in 
the most difficult case of the full dimensional lattices (p = n), since the numbers Ks 
and Kq of iterations of the Schmidt and the Gram algorithms are almost surely n — 1: 
For any e > 0, there exists N such that for any n > N, 

Pr {Ks = n-l}> I - ; Pr {iTc = n - 1} > 1 - 

(d) On the contrary, for a pair (a, 6) of ending indexes the distribution fonction of 
fbVa does not present a threshold phenomenon anymore. More precisely, given two 
positive integer constants i and j, the probability 

Pr [fn-jVn-i < W} 

admits a limit that is a continuous function of v. Thus, the LLL algorithm is much less 
time-efficient, since we show that the number Kl of iterations of the LLL algorithm is 
strictly greater then n — 1, with a non-negligeable probability, 

Pr [KL>n-l}> l/^l + a/sfi. 

For the average number of iterations of the LLL algorithm, the only known upper-bound 
remains V log^ n + n [4]. 

2 New Reductions and Worst-Case Analysis 

First, we recall how the Euclidean properties of a basis are usually evaluated in lattice 
theory. Then, we define two new reductions: For s a real parameter defined by 1, and 7 G 
]0, 1] a fixed real, we introduce the (s, 7)-Gram reduction and the s-Schmidt reduction. 
We compare all these reductions from two points of view, the Euelidean properties of the 
output basis, and the worst-case computational complexity of the algorithms. We obtain 
the results (a), (6) of the introduction. 
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2.1 Two Measures of Quality 

Let ]R” be endowed with the usual scalar product ( , ) and Euclidean length |u| = 
{u, lattice of is the set of all integer linear combinations of a set of linearly 

independent vectors. Generally it is given by one of its basis (6i, 62, • • • , bp) and the 
number p is the dimension of the lattice. So, if M is the maximum length of the vectors 
bi, the data-size is {np log M) . In a lattice, there exist some invariant quantities that does 
not depend on the choice of a basis. Among these invariants, the n successive minima 
Ai are defined as follows: Ai is the smallest positive number t so that there exist in 
the lattice, i independent vectors of lengths at most t. So, Ai is the length of a shortest 
vector. 

Intuitively, a reduced basis of a lattice consists of short vectors or equivalently it is 
nearly orthogonal. The shortness of the vectors is measured by the length defects. The 
i-th length defect pfb) compares \bi \ to the i-th minimum Ai, 

Pi{b) = \bi\/Ai. 

All the reduction algorithms begin with the usual Gram- Schmidt orthogonalization 
process, which associates to a basis b = {bi,b2, ■■■ ,bp) an orthogonal basis b* = 
{b'fbf..., bp) and a triangular matrix (m) = {rriij) that expresses system b into sy- 
stem b* . The vector b) equals 61 and for i > 2 , b^ is the component of bi orthogonal to 
the vector subspace spanned by 61, ... , 5i_i: 

6^ = 6i, and b) ^ bi - ^j^.mijb*j, where nnj = {bi,b*j)/\b*jf, for j < i. 

It is clear that niij = 0 for i < j and mu = 1. 

The length G of the vector b) does play an important role in the sequel. The ratio f i / 1 G | 
is the sinus of the angle between bi and the vector space spanned by 61, . . . , 5i_i. So, a 
nearly orthogonal basis, has all its ratios to 1 and the orthogonality defect 

p{b) measures the “nearly orthogonality” for b, 

p 

p(b) = n (2) 

i=l 

A basis b is called size-reduced if \mij\ <1/2, for l<j<i<p. (3) 

Size-reduction is an easy tool to shorten a basis, since there is a simple algorithm that 
obtains a size-reduced basis from any basis, by integral translations of each bi parallely 
to the previous bj (j < i). But size-reduction alone does not guarantee the usual quality 
needed for a reduced basis. 

2.2 Concepts of Reduction 

Definition 2. Lett, s be two real parameters related by ( 1 ). Givenabasis b = (6i, 62, • • • , 
bp) of a lattice L and for an index i, 1 <i <p — 1 , we consider 



the t-Lovdsz condition : 




(4) 


the s-Siegel condition : 


> s 


(5) 


the s-Schmidt condition : 




(6) 


the {s,y)-Gram condition : 


( 6 ) and lij^xjli > with 0 < 7 < 1 


(7) 



Let Ci be one of the above conditions, for a fixed index i. The basis b is called C-reduced 
if it is size-reduced and if it satisfies the Ci eondition, for all indexes i, 1 <i < p — 1 . 
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The condition (4) is introduced by Lovasz [11] and it is used in the original LLL al- 
gorithm. The s-Siegel condition is an immediate consequence of together (4) and size- 
reduction (3); it is in fact, always used rather than the t-Lovasz condition and in the 
sequel we will often do so. An average study of ratios (fi+i/i'i) shows that they have all 
1 as mean values, but their variances increase with the index i. When i is closed to n, the 
ratios are very dispersed. So that, the s-Siegel conditions are more and more difficult to 
satisfy and it is reasonable to fix a lower bound for (f j+i /f j ) that decreases with the index 

i. We introduce here the (s, 7) -Gram condition that takes in consideration the previous 
remark and is exactly the classical s-Siegel condition, as 7 = 0. The s-Schmidt condi- 
tion' is the less sharp introduced here. The s-Schmidt and s-Siegel conditions can not be 
compared, for a fixed index i. However, if the whole basis is s— Siegel reduced, then it is 
also s— Schmidt reduced. The next lemma compares more locally the above conditions; 
it is useful to study the computational complexity of the reduction algorithms. 

Lemma 3. Let t, s two parameters related by (I). Let (61 , . . . ,bp) be a basis and i <E 
{I, . . . ,p} a fixed index. 

1. If the s-Siegel condition is not satisfied and < 1/2, then the t-Lovdsz 

condition is not satisfied either. 

2. If the (s, j)-Gram condition is not satisfied, neither is the s-Siegel condition. Con- 
versely, the choice of y = Q in the {s,y)- Gram reduction leads exactly to the s-Siegel 
reduction. 

3. If the s-Schmidt condition is satisfied for the index i, but not for the index then 
the s-Siegel is not satisfied for the index i-\-l either. 

2.3 Comparing the Quality of Reduced Bases 

The next theorem shows that the s-Schmidt reduction provides only a short vector of the 
lattice. For the three other reductions, all vectors of the reduced bases are short and the 
basis is nearly orthogonal. For the proof, see [2]. 

Theorem 4. Let b = [b\, . . . ,bp) be a basis of a lattice. 

1. Ifb is s-Schmidt reduced, then its first length defect is bounded from above: 

Mi(&) < (8) 

2. Ifb is (s, y)-Gram reduced, then its first length defect is upper bounded as in (8). All 
the other lengths defects and the orthogonality defect are also bounded from above: 

Pi{b) < {1/ , for all i ^ {2, ...,p} and p{b) < {1/ .s)^^ 

2.4 Comparing Reduction Algorithms 

Let us consider the generic C-reduction algorithm, where C is one of the conditions 
introduced in Definition 2. 



' The Schonhage’s semi-reduction [16] is not too far from our Schmidt reduction. 
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The C-reduction algorithm: 

Input: A basis (6i, . . . , bp) of a lattice L. 

Output: A C-reduced basis b of the lattice L. 

Initialization: Compute the orthogonalized system b* and the matrix m. 
i := 1; 

While i < p do 

bi+i ■= hi+i — ([x] is the integer nearest to x). 

Test the Ci condition. 

If true, make (6i, . . . , size-reduced by translations; set i := i + 1; 

If false, swap bi and update b* and m; if i 7^ 1 then set i := i — 1; 

During an execution, the index i varies in n}. When i equals some /c G { 1 , ■ ■ ■ ,p— 

1 }, the beginning lattiee generated by (61 , . . . , 6fe) is already reduced. Then, the Ck con- 
dition is tested. If the test is positive, size-reduction is performed and the beginning lattice 
generated by (61, ... , hfe+i) is reduced. So, i is ineremented. Otherwise, the vectors bk 
and bk+i are swapped. At this moment, nothing guarantees that {bi, ... ,bk) “remains” 
reduced. So, i is deeremented. The algorithm updates b* and m, translates the new bk in 
the direction of bk-i and tests the Ck~i condition. Thus, the index i may fall down to 1 . 
Finally, when i equals p, the whole basis is reduced and the algorithm terminates. 

The following Theorem shows that the C-reduction algorithm terminates always and 
performs a polynomial number of iterations. However, this worst-case analysis does not 
distinguish between the four previous reduction algorithms. 

Theorem 5. Let s and t be real parameters related by (1), C one of the four conditions 
of Definition 2, and (61, 62) • • • ) bp) any integer input basis. The maximum number K of 
iterations of the generic C-reduction satisfies 

K < p{p — 1 ) logj M + p — 1 , where M := max \bi\. 

Sketch of proof. When C denotes the f-Lovasz condition, the original proof of [ 1 1 ] is 
based on the decrease of the integer quantity 

p— 1 i 

D.= l[Yl‘p 

t=l j=l 

by the factor ( 1 ) , whenever a test is negative. When C is another condition, a negative 

test for an index i means that (61, . . . , 6i_i) is C-reduced, whereas (61, . . . ,bi) is not. 
So, by Lemma 3 , the f-Lovasz test would be negative either. 

3 The Lengths £i in the Probabilistic Model and the Laplace 
Method 

First, we define the probabilistic model. Then we give various tools that we use on 
the average-analysis of the next Section. We begin with the result due to Daude and 
Vallee [ 4 ] about the distribution function of the length f „ of the a — th orthogonalized 
vector associated with random bases. Then, we generalize a method due to Laplace for 
evaluating asymptotics for integrals, to the two-dimensional-ease. 
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3.1 The Probabilistic Model 

All the previous reduction algorithms act in the same way on a basis and on its transformed 
by a homothety. On the other hand, choosing a continuous model makes it possible to use 
powerful mathematical tools. All these reasons legitimate working with bases of length 
less than 1. The most natural and simplest model is thus the uniform model over all legal 
inputs to the reduction algorithms. So, in our analysis, the input veetors bi,b 2 , ■ ■ ■ ,bp, 
(p < n) are chosen independently and uniformly inside the unit ball Bn of Clearly, 
they form almost surely an independent system, ealled a random basis (p-dimensional). 
The lattiee that it generates is called a random lattice. It is full-dimensional if p = n. 
Classieal methods (see Seetion 3 in [4]) generalize our results on a discrete model . 



3.2 The Distribution of Variables £i 

Under the uniform model, Daude and Vallee showed that the squares of the lengths of 
ii’s follow a Beta law (Corollary of the next Theorem). Before describing the result, we 
recall some usual definitions. For u € [0,1] and two reals p and q, the random variable 
X of the interval [0, 1] follows the Beta law of parameters p and q if its distribution 
function satisfies r“ ^p-i ^x 



Pr{X < u} 



fo Ml - xy 



The numerator is called an incomplete Beta integral and denoted B{p, q, u). The nor- 
malization coefficient (the denominator) is simply a Beta integral B{p, q) [21]. 

The next theorem describes the density of the random variables £i. It plays a central role 
in our probabilistic analysis. 

Theorem 6. [4J If the vectors bi,b 2 , ■ ■ ■ ,bp are independently and uniformly distributed 
inside the unit ball Bn o/lR"', then the lengths ii of their ortho gonalized veetors are 
independent variables. The density fi^n{u) of ii on the interval [0, 1] is given by 

fi,n{u) = . .n-i+l i+1 \ ^ (1 “ ^ • 

2 > 2 / 

ib 

All previous reduction algorithms deal with ratios of the form — . In the sequel, 

a{n):=an+i and b{n):=/3n + j, (9) 

are always affine indexes (Definition 1). We look for asymptotic equivalents for the 
distribution function 

Gn,a.b{x') '.= PT^\^ib{n) / ia{n) ^ 

of ratios Iblla- When a(n) f b{n), the previous Theorem shows that fb(n) and ia{n) are 
independent variables and gives their density. So, the density fn,a,b(x, y) of the eouple 
(4(n),4(n)) satisfies fn,a,b{x,y) = /n,a(a;)/n,b(y)- The exaet expression of a, b(^') 

involves the polygon A{v) and Z\(oo) 



A{v) = {(x, y) e [0, 1]2 : y/x < u} and Z\(oo) = [0, 1]4 



and is expressed as a ratio of two generic integrals, 

G{v) = , where In{v) := ( fir(a;, (11) 

4 ( 00 ) JAi^v) 
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with g{x,y) = {1 — x^) ^ x *(1 — y and (12) 

h{x,y) = (a/2)ln(l-a;2) + (/?/2)ln(l-y2) + (l-Q;)ln(a;) + (l-/?)ln(y). (13) 

An appropriate way to study the asymptotic behaviors of integrals such I{v) and /(oo) 
is the Laplace method (see for example [3,5,17]). The next subsection first explains the 
Laplace’s idea. Then we generalize the Laplace method for g and h functions of two 
variables, and the integration domain, a polygon. 

3.3 The Laplace Method for Integrals 

Let us consider the convergent integral = f^F{x,n)dx. 

It often happens that the graph of F{x,n), considered as a function of x, has somewhere a 
sharp peak, and that the contribution of a neighborhood of the peak is almost equal to the 
whole integral, when n is large. Then we can try to approximate F in that neighborhood 
by simpler functions, for which the integral can be evaluated. The advantage is that we 
need only a local approximation for F. This idea is due to Laplace and it is often used 
to find the asymptotic behavior of simple integrals. For our needs, we have to generalize 
Laplace ’s idea for double integrals that appear in ( 1 1 ) : the integration domain A is always 
a convex plan polygon and the functions g and h satisfy some strong assumptions. For 
a more precise version and for the proof, see [2]. 

Proposition 7. Let A be a polygon and In a sequence of absolutely eonvergent integrals 
In = J^g{x,y)e^'^^^’y^ dxdy, where 

(a) the function h has an absolute and strict maximum on A, say at {xq, yo), 

(b) there are two linear forms X and Y of variables {x — xq, y — yo) sueh that near 
{xo, yo) cind inside A, the functions h and g are approximated as follows 

h{xyy)-h{xo,yo) = 0{Hi{X) + H 2 {Y)), when (x,y) — > (xo,yo), (14) 

where Hi,H 2 are one— variable polynomials of low degrees^ pAf and a >1. 

g{x, y) CX^ when (x, y) — ^ (xo, yo)- (15) 

where C f 0, A > —1 and /x > — 1 are real constants. 

Then when n is large, In = O n . 

Remarks 1. If the maximum (xq, yo) is inside the polygon A and not on its boundary, 
then A and p, are positive even integers. If the maximum (xq, yo) is on the boundary but 
not at a vertex of the polygon A, then Y = 0, introduced in (b), is the equation of the 
polygon side that contains (xo, yo) and A is a positive and even integer. If the maximum 
(xo, yo) is at a vertex of the polygon A{v), then X — f),Y = 0 are the equations of the 
two polygon sides containing (xo, yo)- 

^ The low degree of a polynomial is the minimal index i sueh that ai f 0. 
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2. We need the following quite particular ease in Theorem 8: If 

(i) the maximum (xq, yo) is at a vertex of the polygon A and v denotes the tangent of 
the angle of A at its vertex {xq, yo) whieh is acute, 

(it) the linear forms X and Y equal X := x — Xq, Y =: y — yo and they may not 
correspond to the polygon sides’ directions, 

(in) in (14), Hi{X) = H 2 (X) = —AX'^, with A < 0, then when n is large. 



C f \ + l M+ 1 \ + + nh{x„,v„) 

A \ 2 ' 2 ' l + ^ 2 ’ 




■ ( 16 ) 



4 From Ratios lb /(-a to the Reduction Algorithms 

We apply here the maehinery developed in the last Section for studying the asymptotie 
distribution function of the ratio f & / ^a between the lengths of two different orthogonali- 
zed vectors. More preeisely, when n denotes the dimension n of the ambient space 
a(n) and b(n) are two affine indexes , as defined by (9). Theorem 8 exhibits some quite 
different phenomena aeeording to the position of the pair (a, b) with respect to n. 

Then we differentiate the behaviors of the previous reduction algorithms (Theorem 9). 
The reduction algorithms operate always on a random basis (6i, ■ ■ ■ , 6p), (p < n). When 
using the term “almost surely”, we mean that the probability tends exponentially to 1 
with the dimension n. In this section, we prove the points (c), (d) of the Introduction. 

4.1 Asymptotic Behavior of Gn,a,b{v) 

Proposition 7 shows that the asymptotic behaviors of integrals In(v) and In(oo) that are 
involved in Gn,a,b{v)"s expression (1 1) depend strongly on the maximum of the function 
h (13) on the integration domain. The function h has a strict and global maximum on 
Z\(oo) = [0, l]2,at (^/l — a, ^/T^^). Three different behaviors arise according to the 
relative positions of this point and A(v): 

1. The point (\/l — a, — j3) is not in A(v). Then, one shows easily that the func- 
tion h has a global and strict maximum on A(v), which is on the boundary y = vx, 
say at (.f, v^). Since Z\(oo) contains strictly A{v), clearly. Da p v '■= exp (h(^, v^) — 
h(^ — a, -\/l — (3)) < 1. Then, Proposition 7 gives equivalents for I(v) and I(oo). 
Thus, the distribution function tends exponentially to 0 with n. More precisely, there are 
two constants Da,f 3 ,v < 1 and f(ij), such that 

Gn,a,b{v) O {(Da,^,x)" 



2. The point (-\/l — a, — j3) is inside A{v), but not in its boundary y = vx. So, the 
function h has the same maximum on A(v) and on Z\(oo). Moreover, the maximum 

— a , -\/l — /3 ) has t/ze same neighborhood as an element of Z\ (u ) and as an element 
of A(oo) ■ By Proposition l,I(v) and / (oo) have the same equivalents and the distribution 
function tends to 1 with n. Further, since a, b(^') = 1 — Gn,b,a(l/i'), the convergence 
is exponential, , , x n x 

l-G^,a,b(v)^G[[Dp^aA) 

3. The point (\/l — a, — p) is on the boundary y = vx of A(v). The function h has 
the same maximum on A(v) and on Z\(oo). But the point (\/l — a, — p) has not 
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the the same neighborhood as an element of A{v) and as an element of Z\(oo). Then, it 
arrises from (16) that for any fixed v, the distribution function G{v) tends to a constant, 
strictly in ]0, 1[. 



Theorem 8. Let the vectors bi,b 2 , ■ ■ ■ ,bn be independently and uniformly distributed 
inside the unit ball o/M"', and let £a denote the length of the a-th orthogonal ized 
vector. 



(i) Ifa{n) := an+i, b{n) := pn + j are beginning indexes, i.e.{a,f) £ [0, 1[^, thenthe 
distribution function G{v) of the ratio £h/ f-a fallows asymptotically and exponentially 
with n, a 0-1 law. The jump is at {fl — a, s/T^y 

(ii) If a{n) := n — i, b{n) := n — j , i.e. a = p = 1 and i,j are two positive integer 
constants, then the asymptotic distribution function of lb! la does not follow a 0-1 law, 
but variates continuously with i, j and v, in]0, 1[; 



f 

T-x r'^n— 7 

Pr{^ 

^TJ, — 7 



< u} — ^ 



R f i+1 t+1 
^ \ 2 ’ 2 > 






b(¥.¥) 



when n oo. 



(17) 



4.2 Satisfying the C Condition for a Fixed Index 

Theorem 8 shows that for any beginning index, a random lattice satisfies any C condition 
of Definition 2. In short, for the random bases of the uniform model the “serious” 
reduction problems occur for the ending indexes. Considering such indexes, we classify 
the previous reductions in two groups: First, the s-Schmidt and (s, 7)-Gram conditions 
are almost surely satisfied even for an ending index, by (i) of Theorem 8. Second, the 
s-Siegel and t-Lovasz conditions are not, by (ii) of Theorem 8. 

4.3 Full-Dimensional Random Lattices in IR"^ 

Here, we confirm the separation in two groups, that we made in (4.2) for the behaviors 
of different reduction algorithms previously introduced. 

First, let C denote the s-Schmidt or the (s, 7)-Gram condition. The previous paragraph 
showed that for any index i £ {1, . . . , n}, Ci is almost surely satisfied. The next Theorem 
makes precise that the conditions are almost surely satisfied all together. So, all the tests 
Ci are positive and the number of iterations equals n — 1. In other words, almost surely, 
the C-reduction algorithm just size-reduce the input basis and verify that the tests are 
fulfilled. The proof [2] is technical (particularly for the Gram reduction) and is based 
once more on the Laplace’s idea for evaluating asymptotic behavior of sums. 

Second, the s-Siegel (or the t-Lovasz) reduction. If a random basis is reduced, in parti- 
cular the last Siegel condition is satisfied. By the relation (17), we find the asymptotic 
probability that last s-Siegel condition is not satisfied and thus we give a lower bound 
for the probability that a random lattice is not Siegel reduced. Equivalently during an 
execution of the Siegel reduction algorithm, the index i is decremented at least once, 
with a non-negligible probability. 

Theorem 9. Let Kl, Kq, Kg denote the numbers of iterations of the s-Siegel, (s, 7)- 
Gram and s-Schmidt reduction, when they operate on a random full-dimensional basis. 
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For any 0 < e < 1, there exists N (which depends also on 7 in (s, y)-Gram reduction), 
such that for n > N, 



Pr {KL>n-l}> 1/ x/l + (l/s)2 


(18) 


Pr {Kg = n - 1} > 1 - 


(19) 


Pr{As = n - 1 } > 1 - 


( 20 ) 



4.4 Lattices of Low Dimension in IR"^. 

For any real parameter s, 0 < s < 1, let o;o(s) be the greatest real satisfying 

< (1 - uj)uj^ 

By using similar methods (and in a simpler way) as in the proof of (19) and (20), we 
show that the number of iterations of the s— Siegel algorithm when it works on random 
input of dimension p := urn with u; < cuo(s) is almost surely p — 1. Comparing this 
result with (18) shows, in partieular, the importanee of the ratio between the dimensions 
of the input basis and the ambient space on the average behavior of the reduction process. 

5 Statistical Evaluations 

The previous analysis shows that for random inputs of the uniform model, the new 
reduced bases are obtained faster. Several questions remain in praetiee. 

(1) Is the output basis of the LLL algorithm of much better quality? 

(2) We showed that, very probably, the number of iterations of the classieal LLL algo- 
rithm is strictly greater than n — 1 (the minimum possible). On the other hand, the only 
bound established for the average number of iterations of the LLL algorithm is rf log n 
[4]. Is, in practice, the t-Lovasz redueed basis much more slowly to be obtained, in 
comparison with the new redueed bases? 

(3) The uniform model is a tractable one for a mathematical average analysis. But in usual 
applieations of lattice reduetion, the lattices are not really of the uniform model. Are the 
new redueed bases obtained still faster with random inputs coming from applications? 
Our next experimental study gives an insight to the answers of these questions. For every 
model eonsidered, we report experimental mean values obtained with 20 random inputs. 
For each model, we provide three tables. The first table shows the average number of 
iterations of the different reduetion algorithms. The second table gives the n-th root of 
the orthogonality defect of different output bases, (the first line corresponds to the input 
basis). The third table deseribes the ratio between the lengths of the shortest output vector 
of the new reduction algorithms and the classical LLL algorithm. For the approximation 
parameters (sort), we generally use the usual vainest^ = 4/3 (s^ = 1/2), unless for the 
last row of each Table, where the optimal Schmidt algorithm is considered (s^ = 3/4). 

5.1 Experimentations on the Uniform Model 

Table 1 first largely confirms our average analysis. To generate such random integer 
inputs we use ideas of Knuth and Brent[ 8 ]. Then, the number of iterations of the new 
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reduction algorithms is always the minimum possible for lattices of dimension greater 
than 20. Moreover, the output bases of the LLL algorithm is not of better quality than the 
new reduced bases, while it requires much more iterations to be obtained. In other words, 
on random inputs of the uniform model, the LLL algorithm demands a non negligeable 
number of iterations to build a basis of similar quality than its input. Contrarily, the new 
algorithms, immediately detect, the acceptable quality of such an input. 



dim 


10 


20 


30 


40 


10 


20 


30 


40 


10 


20 30 


40 


LLL 


59.8 


132.7 


149.8 


191.4 


1.20 


1.54 


1.71 


1.82 


1 


1 1 


1 


Gr/Sch 


9.4 


19 


29 


39 


1.42 


1.69 


1.71 


1.77 


1.04 


1.07 1.00 


1.00 


Sch opt 


22.9 


19.8 


29 


39 


1.33 


1.69 


1.71 


1.77 


1.04 


1.07 1.00 


1.00 



Table 1. Comparison between the number of iterations and the quality of the output bases on 
random inputs of the uniform model. The input vectors are of length at most 2'^™, where dim is 
the dimension of the input. On the right , we report the number of iterations, on the middle the 
n-th root of the orthogonality defect, and on the left the ratio of lengths of the shortest output 
vector (reference: LLL) 



5.2 Experimentations in the “SubsetSum” Model 



Given oi , . . . , a„, M, consider the basis (6i , . . . , ) of formed with the rows 

of the following matrix: 



/2 0 0 
0 2 0 



Vii. 



. na\ 0\ 

. nu2 0 



2 nttn 0 
1 nM 1 J 



Lattices generated by such a basis are used by Schnorr and Buchner [ 1 5] to solve almost 
all subset sum problems. Moreover, lattices of similar shape are used in many other 
applications [19]. Here, our results are obtained from 20 random inputs of the “Sub- 
setSum” model, generated as follows: Pick random numbers a\, ... in the interval 
[1,2"'], pick a random set J C {1, . . . , n} and put M = (The choice of in 

the interval [1, 2"] lead us to deal with lattices of density 1, which are the most difficult 
in cryptographical applications.) 

Table 2 shows that the new reduced basis are obtained steel quite faster, while on average, 
they remain of a similar quality. Let us point out in particular, the optimal Schmidt 
algorithm which obtains an output vector at most twice longer than the the shortest 
output vector of the classical LLL algorithm, after a number of iterations that is on 
average three times less. 



6 Conclusion 

We have presented and analyzed two efficient variations of the LLL algorithm. 




488 



A. Akhavi 



Number of iterations 



dim 


10 


20 


30 


40 


50 


60 


80 


90 


100 


110 


LLL 


140 


449 


782 


1131 


1775 


1763 


2163 


2266 


2363 


2600 


Sch 


29.9 


75.1 


137 


208 


279 


340 


462 


483 


500 


596 


Gram 


36 


87 


148 


218 


286 


372 


813 


951 


1115 


1363 


Sch opt 


62 


144 


254 


393 


542 


614 


737 


758 


779 


810 



n-th root of the orthogonality defect 



rand 


l.le3 


2e6 


0.2e9 


4.3el2 


5el5 


6.6el8 


8.9e24 


le28 


1.2e31 


1.4e34 


LLL 


1.2 


1.6 


2.1 


2.8 


3 


3.3 


3.8 


3.8 


4.2 


4.1 


Sch 


2.3 


5.9 


10 


18 


31 


54 


64 


48 


38 


35 


Gram 


2 


4.3 


8.6 


16 


28 


31 


19 


19 


20 


19 


Sch opt 


1.5 


2.6 


4 


6.3 


8.3 


7.3 


9 


8.3 


10 


9.9 



Ratio between the lenghts of the shortest output vectors (reference LLL) 



Sch 


1.4 


2.4 


2.7 


4.2 


6.5 


5.4 


5.7 


6.4 


5.4 


3.8 


Gram 


1.4 


2 


2.6 


3.9 


5.9 


5.4 


2.6 


3.8 


3.5 


3.4 


Sch opt 


1.1 


1.4 


1.6 


1.6 


1.9 


1.7 


1.8 


1.7 


1.7 


2.1 



Table 2. Comparison between the number of iterations and the quality of the output bases on 
random inputs of the Subset Sum model of density 1 . 

From a theoretical point of view, we have exhibited several threshold phenomena in 
random lattices of the uniform model: (1) The distribution function of Ihjta, when a 
and b are beginning indexes, which follows asymptotically a 0-1 law (2) the gap that 
occurs in the behaviors of these distribution functions as (a, b) becomes a pair of ending 
indexes, (3) the gap between the reduetion probabilities of a random input for (s, 7)- 
Gram reduetion (7 > 0) and s-Siegel reduction (which coincides with (s, 7)-Gram, with 
7 = 0). 

From a praetical point of view, empirical tests of Section 5 show that our new reductions 
are quite interesting. They provide some new tools for lattice reduction and can be very 
useful to build up an algorithmic strategy of reduetion. 
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Abstract. In the CUT PACKING problem, given an undirected connected graph 
G, it is required to find the maximum number of pairwise edge disjoint cuts in 
G. It is an open question if CUT PACKING is NP-hard on general graphs. In 
this paper we prove that the problem is polynomially solvable on Seymour graphs 
which include both all bipartite and all series-parallel graphs. We also consider 
the weighted version of the problem in which each edge of the graph G has a 
nonnegative weight and the weight of a cut D is equal to the maximum weight of 
edges in D. We show that the weighted version is NP-hard even on cubic planar 
graphs. 



1 Introduction 

In the CUT PACKING problem, given an undirected connected graph G, it is required 
to find the maximum number of pairwise edge disjoint cuts in G. This problem looks 
natural and has various connections with many well-known optimization problems on 
graphs. E. g., one can observe that INDEPENDENT SET can be treated as a constrained 
version of CUT PACKING whose collection of feasible cuts consists of all stars. The 
directed counterpart of CUT PACKING — DICUT PACKING — is well known to 
be polynomially solvable on general graphs (see Lucchesi [14], Lucchesi and Younger 
[15], Frank [8] or Grotschel et al. [11, p. 252]). By contrast, the complexity status of 
CUT PACKING on general graphs still remains an open problem. In this paper we use 
a characterization in [1] and an algorithmic result on joins due to Frank [9] to prove 
that CUT PACKING is polynomially solvable when restricted to the family of Seymour 
graphs. To present a rigorous definition of Seymour graph we need to introduce a few 
more notions. At this point we only notice that the family includes both all bipartite and 
all series-parallel graphs (Seymour [18] [19]). It should be also noted that the complexity 
status of the recognition problem for Seymour graphs remains unclear; it is known only 
that the problem belongs to co-NP [1]. All we have to say for ourselves is that our result 
also holds for recognizable in polynomial time subfamilies of Seymour graphs such as, 
e. g., above mentioned bipartite* and series-parallel graphs. 

* This research was partially supported by the Russian Foundation for Basie Research, grants 
97-01-00890, 99-01-00601. 

* Andras Frank pointed out to me that the polynomial-time solvability in the ease of bipartite 
graphs was shown earlier and in a different way by D. H. Younger [20]. 
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Graphs in the paper are undirected with possible multiple edges and loops. Given a 
graph G and v <E V (G), Ng{v) will denote the set of vertices adjacent to v in G. Any 
inclusion-wise minimal cut can be represented in the form 6g{X), i. e., as the set of 
edges leaving a set X C V (G). If the context allows we shall omit subscripts and write 
N{v),6{X). 

A join in a graph G is a set of edges J C E{G) such that each circuit of G has 
at most half of edges in J . Matchings and shortest paths between any two vertices are 
simple examples of joins. The term “join” originates from the classical notion of T-join 
introduced by Edmonds and Johnson in their seminal paper [7]. Given a graph G and a 
vertex subset T C V (G) of even cardinality, a subset J of edges of G is called a T-join 
if the set of vertices having odd cardinality in the subgraph of G spanned by J coincides 
with T. It can be shown that any minimum cardinality T-join is a join and, moreover, any 
join is a minimum cardinality T -join where T is the set of vertices having odd cardinality 
in the subgraph of G spanned by J (Guan’s lemma [12]). This establishes a one-to-one 
correspondence between joins and minimum cardinality T-joins of a graph. 

Let {Di, . . . , Dk] be a collection of disjoint cuts in G. Pick an edge Ci in each cut 
Di and set J = {ei, . . . , 6^}. Since each cut and each circuit in a graph may have in 
common only an even number of edges , J is a join. If a join J in G can be represented 
in this way, we call {Di, . . . , Dk] a complete packing of J and say that J admits a 
complete packing . 

There are graphs in which not every join admits a complete packing. A simple 
example is K^. Any two cuts of intersect and therefore, every join admitting a 
complete packing in consists of at most one edge whereas any perfect matching of 
Ki constitutes a join of cardinality 2. 

A graph G is called a Seymour graph if every join of G admits a complete packing. 

A co-NP characterization of Seymour graphs is provided by the following theorem 
which is an easy corollary of a stronger result in [1] (a bit shorter alternative proof can 
be found in [2]). 

Let G be a graph and J be a join of G. We say that a circuit G of G is J -saturated 
if exactly half of the edges of G lie in J. A subgraph iT of G is called J-eritical if it is 
non-bipartite and constitutes the union of two J-saturated circuits. E.g., if G = and 
J is a perfect matching, then G itself is J-critical. 

Theorem 1 (Ageev, Kostochka & Szigeti [1]). A graph G is not a Seymour graph if 
and only if G has a join J and a J-eritical subgraph having maximum degree 3. 

The “if part” of this theorem is easy and holds even when no bound on the maximum 
degree is assumed (see [1]). 

To prove that CUT PACKING is polynomially solvable on Seymour graphs we shall 
use the above characterization and the following algorithmic result. 

Theorem 2 (Frank [9]). Given an undirected graph G, a join of maximum cardinality 
in G can be found in polynomial time. 

The running time of Frank’s algorithm is bounded by 0{mn), where n = |U(G)|, 
m = |T(G)| [9]. Let G be a Seymour graph and J be a join of G with maximum 
cardinality returned by Frank’s algorithm. By the definition of Seymour graph J admits 
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a complete packing {I?i, . . . , where k = | J|. Since the size of any collection of 
disjoint cuts in G does not exceed k, the collection {Di, . . . , Dk} provides an optimal 
solution for CUT PACKING on G. The question remains how to find such a collection 
in polynomial time. 

We shall prove that even a bit more can be done. 

Theorem 3. Given a Seymour graph G and a join J in G, a complete packing of J can 
be found in polynomial time. 

This contrasts with the well-known result of Middendorf and Pfeiffer [ 1 7] that, given 
a planar cubic graph G and a join J of G, it is NP-complete to determine whether J 
admits a complete packing. 

We notice that alternative and a bit more sophisticated proofs of Theorem 3 can be 
extracted directly from the arguments in [1] and [2]. 

Our proof of Theorem 3 relies upon three basic facts. The first fact — Lemma 4 

— states that the family of Seymour graphs is closed under the operation of star con- 
traction of a graph. To prove this we use Theorem 1 . Lemma 4 is of apparent interest 
irrelatively the scope of this paper. In [3] we develop the related subject to elaborate 
an alternative co-NP characterization of Seymour graphs. The second fact — Lemma 5 

— is well known among “T-joins” experts and reveals a connection between the same 
operation and joins admitting complete packings. Unfortunately, no published proof of 
this important statement appeared. To make the paper self-contained we include such a 
proof in Section 3. The third fact is the classical result due to Edmonds and Johnson [7] 
that, given a graph G and an even vertex subset T, there exists a polynomial-time algo- 
rithm to find a T-join of G with minimum cardinality. We shall use the straightforward 
corollary of it and Guan’s lemma: given a graph G and an edge subset J, one can decide 
in polynomial time if J is a join of G. 

Together with the original setting it seems natural to test for NP-hardness a bit more 
general, weighted version of the problem (WEIGHTED CUT PACKING): given a graph 
G with nonnegative edge weights w{e), find a collection of disjoint cuts {Di , . . . , D^} 
in G maximizing 

k 

y~^max{tc(e) : e e Di}. 

i=l 

CUT PACKING is equivalent to that special case of WEIGHTED CUT PACKING in 
which all weights w(e) are equal to 1. We shall demonstrate in the last section of this 
paper that WEIGHTED CUT PACKING is NP-hard even on cnbic planar graphs. 

2 Seymour Graphs and Star Contractions 

Since every bipartite graph is a Seymour graph and every other graph can be obtained 
by contraction of a bipartite graph, the family of Seymour graphs is not closed under 
the operation of contracting an edge. The operation that truly preserves the Seymour 
property is star contraction. 

Let G be an undireeted graph. We say that a graph H is obtained from G by con- 
tracting a star if iT is a result of contracting edges of some star in G. 
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If X is a vertex subset of a graph G then G/X will denote a graph which is obtained 
from G by identifying (shrinking) all vertices in X (after this shrinking new multiple 
edges and loops may appear). Notice that no edge is deleted from G and so G/X has 
the same set of edges as G. 

Lemma 4. Every graph obtained from a Seymour graph by contracting a star is a 
Seymour graph. 

Proof. Suppose to the contrary that a Seymour graph G has a vertex u such that contrac- 
ting the star at u results in a non-Seymour graph H. We can equivalently think of H as 
the subgraph G/X — u of G/X where X = Nq{u) (for an illustration see Fig. 1). Let 
^ denote the vertex of G/X that is the result of shrinking the set X in G. Notice that by 
Theorem 1 , G does not have any J'-critical subgraph for any join J' of G. Since H is not 
a Seymour graph, by Theorem 1 it has a join J and a J-critical subgraph S = Gi U G 2 
of maximum degree 3, where Gi and G 2 are J-saturated circuits. Note that J is also a 
join of G. Note first that ^ lies in S; otherwise S would be a J-critical subgraph of G, 
which by Theorem 1 would contradict the assumption that G is a Seymour graph. Let 
S', Gj, and G 2 denote the subgraphs of G spanned by the edges of G lying in S, Gi, 
and G 2 respectively. Let [xi,X 2 , ■ ■ ■ ,Xk} denote the subset of vertices of S' lying in X. 
Recall that S has maximum degree 3 and hence, by the definition of J-critical subgraph, 
each vertex of S has degree either 2 or 3. It follows that 1 < k < 3. Note first that in 
fact 2 < k < 3, since k = 1 means that S' is isomorphic to S which implies that S' is a 
J-critical subgraph of G, i. e. G is not a Seymour graph. Then at least one of the graphs 
Gj and G 2 is a path. Recall that Xi are neighbours of u in G. For each i, let denote an 
edge connecting Xi and u. Denote by T the subgraph of G obtained from S' by adding 
the vertex u and the edges Ci . Our goal in the remaining part of the proof is to show that 
T is a J* -critical subgraph of G for some join J*. By the remark just after Theorem 1 
this contradicts the assumption that G is a Seymour graph and thus proves the lemma. 
Observe first that S can be obtained from T by contracting the star at u. Thus, since S 
is non-bipartite, T is non-bipartite as well. 




G G/X 

Fig. 1 . A graph G and the graph G/X obtained from G by shrinking the vertex subset X = Nq (u) 
to a new vertex f 
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Case V.k = 2. 

Let J* = J U {ei}. Of the circuits C\ and C2 at least one passes through Assume 
first that, say, C\ does, whereas C2 not. Then is a path, whereas C'2 is a circuit in 
G. Set C'{ = U {ei} U {62}- Then, hy construction, T = C" U C'2 and, moreover, 
C'{ and C'2 are both J* -saturated circuits. Assume now that and C'2 are both paths. 
Then set C- = C' U {ei} U {62}, i = 1, 2. Again, by construction, T = C" U C'2, 
and moreover, C'( and C'2 are both J* -saturated circuits. Thus, in either case T is a 
J*-eritical subgraph of G. 

Case 2: k = 3 (for an illustration refer to Fig. 1). 

In this case ^ has degree 3 in S' and, eonsequently, Ci and C'2 both pass through 
Therefore C( and C'2 are both paths in G. Sinee S' = Ci U C2, it follows that of the 
three edges incident with ^ in S, exaetly one lies in both Ci and C2. W.l.o.g. we may 
assume that this edge is incident with x\ in G. Our assumption implies that x\ is a 
common endpoint of C( and C'2 and we may assume further that the other endpoint of 
C'l is X2 whereas that of C'2 is 0:3. Now set J* = J \J {ei}, C'{ = C( U {ei} U {62}, 
and C'2 = C2 U {ei} U {es}. Then, by construction, T = Cf U C'2 and moreover, C'{ 
and C'2 are both J* -saturated circuits. It means that, again, T is a J*-eritical subgraph 
ofC. □ 

3 Joins Admitting Complete Packings and Star Contractions 

In this section to make the paper self-contained we give a proof for a folklore lemma 
whieh is one of the crueial points in our proof of Theorem 3. 

Let G be a graph. For v £ V (G), denote hy G-kv the graph that is obtained from G 
by eontracting the star at v. Let J be a join of G. A vertex v £ V (G) is ealled J -marginal 
if V is incident with exaetly one edge e in J and J \ {e} is a join of G * u. 

Lemma 5. Let G be a graph and J be a join of G. IfJ admits a complete packing, then 
G has a J -marginal vertex. 

Proof. Sinee J admits a complete paeking, G has a collection of edge disjoint cuts 
such that each cut contains exactly one edge in J. Among all such collections choose a 
collectionC = {h(Xi),h(X2), . . . ,h(Xfe)}withminimum|Xi|.Foreachi,let JnXi = 
{ci}. Let V be the end of ei that lies in Xi. We claim that Xi = {u}. Assume to the 
contrary that Xi \ {u} f 0. We show first that for any i, Xi n Xi is either 0 or Xi. 
Assume not, that is 0 < |Xi D Xi\ < |Ai| for some i. Since ei ^ S{Xi), either both 
ends of ei lie in Xi or both not; for the similar reason the same holds with respect to 
and Xi. Note that we may assume that both ends of ei lie in Xi, since the replacement 
of the set Xi by its complement does not alter the collection C. Assume that both ends 
of 6 i do not lie in Xi. Set Xj = Xi n Xi and X^ = Xi U Xi. It is easy to check that 
6 {X'f n 6 {X'f = 0, 6 {X'f U 6 {X'f = 6 {Xf U 6 {Xi), and a e S(X{), a e h(X'). 
Replace in the collection C the cuts h(Xi) and S{Xi) by h(Xj) and h(X') respectively. 
This yields a new collection C = {h(Xj), h(X2), . . . ,d{X-), . . . ,h(Xfe)} with the 
same properties except that |Xj| < |Xi|, which contradicts the choice of C. Assume 
now that both ends of Ci lie in X\. Then we set Xj = Xi n Xi and X' = Xi U Xi. 
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Similarly, r\S{X-) = 0, U S(X-) = <j(Xi) U S{Xi), and, unlike the above, 
ei e S{X-), a e S{X{). This implies that, again, the collection C can be replaced by 
another collection having the same properties except that it includes the cut with 

1^(1 < \Xi |, which is a contradiction to the choice of C. Thus we have proved that for 
eachi ^ 1, either = 0orXi C It immediately follows that Xi = {u}, since 

otherwise we could replace in C the cut (i(Xi ) by the star cut S{v). Set S = Nq (u) U {u}. 
Now consider the graph G -k v and denote by u> the vertex of G k v that is the result 
of contracting the star at v in G. Since the cuts in C are edge disjoint, we have either 
S' n Xi = 0 or S' C Xi for each i ^ 1. Now for each i ^ 1, set X- = (X^ \ S) U {a;} if 
S cXi and X; = X^ otherwise. It follows that C* = {^(Xa*), (j(Xg*), . . . , S{X*^)} is 
a collection of disjoint cuts of G kv with Ci € <i(X/ ) for each i ^ 1. This implies that 
J \ {ei } is a join inG kv and C* is a complete packing of J \ {ei }, as desired. □ 

4 Proof of Theorem 3 

We describe a polynomial-time algorithm that, given a Seymour graph G and a join J 
of G, finds a complete packing of J. 

Let G be a graph and J be a join of G. If u is a J-marginal vertex of G, then e(v) 
will denote the (unique) edge in J that is incident with v. The algorithm runs through k 
similar steps where k = | J|, and outputs a complete packing {Di,D 2 , ■ ■ ■ , 

Step 0. Set Gi := G, Ji := J. 

Step i, f ^ i ^ k. Look successively through the vertices of Gi to find a J^-marginal 
vertex Vi of Gi. To check if a vertex v of Gi is -marginal, call one of the known 
polynomial-time algorithms that can test if Ji \ {e(u)} is a join of Gi k v. Set Di to be 
equaltothesetofedgesofthestaratui inGi. SetGi+i := Gi*Ui, := Ji\{e{vi)}. 
If i < k goto step i + 1, otherwise end up. 

We now establish the correctness of the algorithm. By Lemma 4 each graph Gi is 
a Seymour graph, since each was obtained from the Seymour graph G by a sequence 
of star contractions. Using this. Lemma 5 and a simple induction on i we further obtain 
that, for each i = 1, . . . , fc, is a join admitting a complete packing in Gi. Moreover, 
by Lemma 5, for each i = 1, . . . , k, Gi does contain a J^-marginal vertex and thus the 
algorithm never terminates before step k. Next, it is clear from the description that Di 
and Dj are disjoint whenever i ^ j . And finally, since every cut of Gi is a cut of G, each 
Di is a cut of G. □ 

It is clear that the running time of the algorithm is 0(n^G(m, n)) where n = |U(G)|, 
m = |£1(G) I and G(m, n) is the running time ofthe chosen procedure to test ifa given set 
of edges is a join. As such, one can use any ofthe known algorithms for finding a minimum 
cardinality T-join: the original O(n^) -algorithm of Edmonds and Johnson ([7]), the 
0(mn log n) -algorithm of Barahona et al. ([4], [6]) or an 0(r?!‘^ log n) -algorithm for 
the special case of planar Seymour graphs ([16], [10], [5]). 

As was already mentioned above, the running time of Frank’s algorithm for finding 
a join of maximum cardinality is 0{mn) [9]. Thus, if we use, e. g., the algorithm of 
Barahona et al. ([4], [6]) for finding a minimum cardinality T-join, the overall running 
time of our algorithm for solving CUT PACKING on Seymour graphs will be bounded 
by 0{mn^ logn). 
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5 NP-hardness Result 

Theorem 6. WEIGHTED CUT PACKING is NP-hard on the cubic planar graphs. 

Proof. We use a reduction from the following NP-complete decision problem [17]: given 
a graph G and a join J of G, to decide if a join J admits a complete packing in G. Assign 
(0, l)-weightsru to the edgesofGin the following way: tu(e) = life € Jandte(e) = 0 
otherwise. Let {Hi , D 2 , ■ ■ ■ , } be a collection of edge disjoint cuts in G of maximum 

total re-weight. We may assume that each Di in the collection has weight 1 and thus the 
total weight of the collection is k. We claim then that k = | J| if and only if J admits 
a complete packing. Indeed, if k = | J| then {Hi, H 2 , . . . , H^,} is a complete packing 
of J. On the other hand, if J admits a complete packing {H{ , D' 2 , . . . , H| ^ 1 } then this 
packing has weight | J | . However, any collection of disj oint cuts in G has weight at most 
I J| and so k = | J|. □ 

Acknowledgements. The author is grateful to Andras Frank for helpful comments and 
an anonymous referee for pointing out the related results on DICUT PACKING. 
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Abstract. Given a set V of m subpaths of a length n path P, Gyori’s theorem gives 
a min-max formula for the smallest set C? of subpaths, the so-called generators, such 
that every element of V arises as the union of some members of Q. We present an 
implementation of Frank’s algorithm for a generalized version of Gyori’s problem 
that applies to subpaths of cycles and not just paths. The heart of this algorithm is 
Dilworth’s theorem applied for a specially prepared poset. 

- We give an y^logn + mlogn) running time bound for Frank’s al- 

gorithm by deriving non-trivial bounds for the size of the poset passed to 
Dilworth’s theorem. Thus we give the first practical running time analysis for 
an algorithm that applies to subpaths of a cycle. 

- We compare our algorithm to Knuth’s 0((n + m)^) time implementation 
of an earlier algorithm that applies to subpaths of a path only. We apply 
a reduction to the input subpath set that reduces Knuth’s running time to 
0{n^ log^ n + m log n) . We note that derivatives of Knuth’s algorithm seem 
unlikely to be able to handle subpaths of cycles. 

- We introduce a new “cover edge” heuristic in the bipartite matching algorithm 
for Dilworth’s problem. Tests with random input indicate that this heuristic 
makes our algorithm (specialized to subpaths of a path) outperform Knuth’s 
one for all except the extremely sparse (m « n/2) inputs. Notice that Knuth’s 
algorithm (with our reduction applied) is better by a factor of approximately 
^/n in theory. 



1 Introduction 

Gyori’s theorem [9] gives a min-max formula for the following problem. Let P be a path 
(or a cycle, as in an extension of the theorem due to Frank and Jordan [6]); let a collection 
V of its subpaths be given. Then we want to find another collection Q of subpaths, the 
so-called generators such that each element of V arises as the union of some paths in 
Q. Note that the problem for subpaths of a tree is known to be NP-complete [14]. The 
theorem is often stated in the terminology of horizontally convex reetilinear bodies [9] 
or interval systems [13] that ean easily be transformed to our terminology. 
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Frank and Jordan [6] find a remarkable connection of Gyori’s problem to a large 
collection of so-called connectivity augmentation problems. Some of these problems 
bear algorithms very similar to the one presented in the paper; most of them are based 
on Dilworth’s min-max theorem for posets. A poset (partially ordered set) is a set with a 
transitive comparison relation <. The relation is not necessarily complete: incomparable 
elements x, y satisfy neither x < y nor y < x. Dilworth’s theorem states that the 
maximum number of pairwise incomparable elements of a poset is equal to the minimum 
number of chains (some k elements with x\ < X 2 < ■ ■ ■ < Xk) such that each element 
of the poset belongs to at least one chain. Dilworth’s theorem lies in the heart of Frank’s 
algorithm [5]; as it will turn out, it is the computational bottleneck both in theory and 
in practice. We implement and analyze the Frank-Jordan extension of Gyori’s theorem 
and in particular give efficient algorithms for Dilworth’s theorem. 

1.1 Previous Results 

Franzblau and Kleitman [7] turned Gyori’s proof to a polynomial time algorithm for 
finding the minimum and maximum in question. This algorithm was simplified by Knuth 
[13] who implemented the algorithm and derived an 0((n+m)^) running time bound for 
m subpaths of a path of n edges. Knuth’s algorithm has no simple extension for subpaths 
of cycles. Frank and Jordan gave the first (non-combinatorial) algorithm [6] that applies 
to subpaths of a cycle as well; that algorithm was then turned to a combinatorial one by 
Frank [4,5]. In the center of both algorithms we find Dilworth’s theorem for a certain 
poset derived from the input set of subpaths. 

The only known algorithm to find the minimum number of chains and the maximum 
incomparable subset in Dilworth’s theorem uses a reduction to a bipartite matching 
problem (Ford and Fulkerson [8]). For a poset of p elements where the number of 
comparable pairs is q, the resulting bipartite graph has 2p vertices and q edges; the 
maximum matching can then be found in 0{p^) time by the Hopcroft-Karp matching 
algorithm [10]. The hunt for a better bipartite matching algorithm remains open [12]. 

1.2 Our Results 

We describe an implementation of Frank’s algorithm [5] for Gyhri’s theorem and improve 
Frank’s 0(n^ + m) running time bound to 0(n^’®^logn + mlogn). Our algorithm, 
unlike Knuth’s one [13], uses the extension of Gyori’s theorem to cycles [6] and applies 
to subpaths of a cycle instead of a path. 

Our new algorithms for Dilworth’s theorem are of own interest. Instead of taking the 
g-element set of all comparable elements, our algorithms take the typically much smaller 
q “cover” or immediate successor pairs typically used to describe a poset. In the usual 
poset terminology an element (J, e) covers another {J, f), denoted by (/, e) -< {J, f), 
< ( J, /) and no element (/', e') satisfies (J, e) < (/', e') < ( J, /); the cover 
relation gives the so-called Hasse diagram — the typical description of the poset. 

Our running time analysis consists of non-trivial combinatorial proofs. We give 
tight bounds for the poset size derived from the input subpath system. We prove that 
an initial reduction may discard all except 0(n log n) subpaths from the input without 
altering the output. This reduction also applies in Knuth’s algorithm [13] and improves its 
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running time to O (n^ log^ n + m log n) ; this becomes the theoretically fastest algorithm 
if subpaths of a path are considered. 

We test our implementation with random input. We specialize our algorithm to sub- 
paths of a path and compare its performance with Knuth’s algorithm. In practice our 
algorithm turns out better apart from extreme sparse problems with n « 2 m (note that 
we test Knuth’s algorithm with our initial subpath set reduction). We believe this is due 
to our improved way to perform Dilworth’s theorem: while in theory we were not able 
to show g <C 9 in the posets arising during the algorithm, our tests indicate this may be 
the case — at least for average or random sets of subpaths. 



1.3 The Gydri-Frank-Jordan Theorem 

Frank and Jordan [ 6 ] prove an extended version of Gyori’s theorem for subpaths of a 
cycle C. They introduce the following reformulation of the theorem that we also use in 
the bulk of the paper. Consider the set of subpath-edge pairs (J, e) where I e V and 
e € / is an edge of cycle C. Notice that a set of subpaths Q generates V iff for all {I, e) 
there is an element G E Q with e £ G C I; we say that G generates the pair (/, e). We 
may restrict attention to “minimal copies” of subset-edge pairs: if / C J, then it suffices 
to generate (J, e) since then ( J, e) is also generated. In Frank’s [5,4] terminology, (J, e) 
is essential if there is no other subpath J £ S with e £ J C I. 

Let £ denote the set of essential subset-edge pairs. If (J, e) and ( J, f) <E £ cannot 
be generated by the same subpath G, then we call them independent. Independence 
implies one of the following cases: either eel— J, or fe J — I, or I D J consists 
of two disconnected components and e and / lie in different components (this may 
happen if J U J = G). By further investigation we find two mutually exclusive cases for 
non-independent elements (J, e) and ( J, f) e £ (see Fig. 1): 

Comparable. We say that (J, e) < ( J, /) if e and / belong to the same connected 
component of / fl J and the following four edges (possibly some of which are 
equal) follow in clockwise order: the first edge of J, edge /, edge e, and the last 
edge of /. The elements of £ form a poset with the above comparison relation. 
Crossing. If two non-independent elements (/, e) and ( J, /) of £ are non-comparable, 
then we say they are crossing', crossing is clockwise if the first edge of J, the first 
edge of J, / and e follow in clockwise order. 



/ Cl 

^ cm 



£2 £3 

T ~ ~ ) 



Fig. 1. Two subpaths I, J C C. J may or may not wrap around and consequently 7 n J may 
or may not consist of two connected components. The subpath-edge pairs (7, ei) and {I, € 2 ) 
are independent of all ( J, fi) for i = 1,2, 3; similarly ( J, /s) is independent of all (7, Ci) for 
i = 1,2, 3. As for the remaining arrangements, (J, /i) < (7, 63 ); (7, 63 ) clockwise cross (J, / 2 ); 
and (J, / 2 ) anti-clockwise cross (7, 63 ). 
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The size of a set of pairwise independent subpath-edge pairs is a clear lower bound 
on the size of a generator subpath set. The Gyori-Frank-Jordan min-max theorem claims 
the opposite is true as well: 

Theorem 1 (Gyori [9], Frank and Jordan [6]). The minimum number of subpaths that 
generate a given set of subpaths V of a cycle C is equal to the maximum number of 
pairwise independent subpath-edge pairs {I, e) with e G / G P. 



1.4 Frank’s Algorithm 

Frank’s algorithm [5] is a four-phase procedure (see Fig. 2) that finds a minimum genera- 
tor set for subpaths of a cycle C. The algorithm is based on the following observation [6]: 
a given chain (Ji, ei) < (/2, 62) < ... < (Tfe, e^) can be generated by a single subpath 
between Ck and ei . Now one might want to applying Dilworth’s theorem over the essen- 
tial set £ to obtain an optimum chain decomposition that defines an optimum generator 
system. Unfortunately however, the notion of incomparability under the partial order < 
is weaker than the notion of independence: two crossing elements are incomparable but 
they always possess a common generator subpath. 

The main idea now in Frank’s [5,4] algorithm is to remove crossing elements and 
thus make independent and incomparable mean the same. After some initialization steps 
described in Section 2, Phase 2 of Frank’s algorithm (Fig. 2) constructs a cross-free set fC 
by considering pairs in £ and removing one of them if they cross. Then Phase 3 applies 
Dilworth’s theorem to /C to construct generators Q as well as a pairwise independent 
subset X of 1C such that Q and X have the same size. 

Phase 3 (Fig. 2) thus yields a set Q that generates IC but not all of £. In the final 
Phase 4 we identify all ( J, /) not generated by Q. As the main result, Frank [5] gives a 
simple procedure to adjust Q so that it will generate ( J, /) without increasing its size or 
“un-generating” previously generated elements. 

In order for Frank’s theorem to hold, we must follow a specific rule to construct 
the “cross-free” /C: We consider each (J, e) G and discard all subpath-edge pairs that 
cross (J, e) . For now the order of the {I, e) is arbitrary; later we choose a certain order 
that makes the algorithm efficient. Finally for the last “correction” phase to work, we 
must remember the order of deletions and consider discarded elements in reverse order; 
for each ( J, /) G £ that we discard, we also need to record the element (J, e) causing 
its deletion: 

Lemma 2 (Frank [5]). Let (J, e) be the essential element causing the removal of {J, /) 
from 1C. (i) Neither (/, /) nor ( J, e) may have been removed from fC prior to {J, /). 
(ii) If Gi generates (J, /) and G 2 generates {J, e) and neither G\ nor G 2 generates 
( J, /), then they satisfy G 2 C G\. (in) For G\ and G 2 as above, if we define the 
subpaths Gj and G '2 by exchanging the endpoints of G\ and G 2 , then one of G’^ and 
G '2 generates {J, /). (iv) If{K, g) is a subset-edge pair generated by G\ or G 2 , then 
{K, g) is generated by G '2 if (a) g is clockwise after eor (b) g = e but K preeedes I and 
(e) by Gj in all other cases (see Fig. 3). □ 
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Phase 1 preprocess the input 

fC,£ ^ essential subpath-edge pairs 
Phase 2 for all (/, e) € /C do 

for all ( J, /) 6 /C do 

if (/, e) and ( J, /) cross 

then mark ( J, /) by (/, /), ( J, e) and delete ( J, /) from /C 
Phase 3 call Dilworth(/C) to find a pairwise independent set X 

and subpaths G generating K, 

Phase 4 for all ( J, / ) 6 X — /C in the reverse order of deletion do 
{I, /), {J, e) ^ marks of (J, /) 

find G?i € 0 generating (/, /) and G 2 E G generating ( J, e) 
if neither Gi nor G 2 generate ( J, /) 
then exchange the endpoints of Gi and G 2 in G 



Fig. 2. Frank’s algorithm 

(I,e) 

^(K,g) ^ 

jJJ) 



Gi 



Gi 



a 



Fig. 3 . The scenario of Lemma 2. 



2 Our Implementation 

Unfortunately the straight implementation of Algorithm 2 has a poor time complexity. 
The set of all suhset-edge pairs (even after removing duplicates) may have size J? (n^ ) and 
even \£\ = Thus the best runtime bound Frank [5] achieves is 0{n^) for Phase 1 

and O(n^) for Phase 2. Next we sketch our implementation where the computational 
bottleneck is an algorithm for Dilworth’s theorem on a poset. All additional work will 
be bounded by a term 0{nm + n? log^ n). Due to space limitations we only sketch our 
implementation; for details see the full paper [2], 

Initial reduction. We save a large amount of work by a simple preprocessing step that, 
in 0{m log n) time, deletes all but 0{n log n) subpaths from the input without altering 
the optima in question. Observe that if J = |J for G € V, then the generators for 
li will generate / and thus I may be removed from the input. We quickly identify and 
discard a vast amount of such subpaths I. 

For simplicity we explain the reduction for subpaths of paths only. We use a divide- 
and-conquer approach: We consider the first n/2 edges Pi C P and recursively reduce 
the set of subpaths {I : I eP and / C Pi }. We do the same for the last n/2 edges P 2 . 
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Fig. 4. Left: the instance Vs', a sub-instance V 2 is within the shaded area. Right: the recursive way 
to build the instances. On both sides of the figure thick subset-edge pairs are all essential and no 
two of them cross. 



Now take the subset V' VV of paths that are the subpaths of neither Pi nor P 2 . Notiee 
that I can be removed unless it the shortest element of Pi either starting or ending at a 
certain edge of P. By solving the recurrence for the above recursive algorithm, we get 
the next elaim (an instanee for the tightness of the elaim is given in Fig. 4). 

Theorem 3. There are 0{n log n) subpaths of a cyele C of length n such that no subpath 
arises as the union of some other two of these subpaths. We may find such a subset in 
0(m log n) time. □ 

Phase 1. In Phase 1 a natural data structure of the essential set £ is construeted in 0{nm) 
time. First however we contract all consecutive pairs of cycle-edges such that no path 
starts or ends at their common vertex (a straightforward O (n + m) time proeedure); then 
we sort the input so that the first edges follow clockwise; ties are broken so that the last 
edges follow anti-clockwise. Our implementation uses Quicksort; however an improved 
0{n + m) time ean be achieved by radix sort [3]. 

We define the following natural 2D list data structure for £. We visualize the subpath- 
edge pairs in ^ as a table with rows corresponding to subpaths and containing all edges 
of a fixed subpath while columns corresponding to edges and containing all subpaths 
containing a fixed edge (see Fig. 5). The fact that this table is drawn on a torus, howe- 
ver, makes this kind of visualization somewhat inappropriate. The aetual data structure 
consists of two collections of doubly linked lists. The first collection contains one list 
for eaeh subpath I {row lists) while the second collection one list for eaeh edge e £ C 
{column lists). For each list we may access the first and last elements. The size of the 
data strueture is (due to possible empty lists) 0(|^ | + m). 

Now we eonstruct the essential set by parsing eaeh edge e € C in a cyelic order 
and considering all subpaths Ii, ... ,Ik starting at this edge e. Then in an inner loop 
we seleet all edges / and consider all subpaths Ji, . . . , starting at edge /. Now we 
choose all pairs Jj C no subset-edge pair {Ii,g) with g e Jj may be essential. Since 
for eaeh f it suffices to select the longest Jj c Ii,'we may perform this procedure by 
merging the sequences I\,...,Ik and Ji, . . . , sorted by the endpoints. For a fixed e 
and / we use time 0{k + f); this totals to 0{nm). 

If we are eareful in seleeting e and /, it is easy to insert the essential {f, /) into the 
2D list data structure. We sean / in a linear order starting from e (the first edge of all 
Ii). For eaeh Ii we maintain a pointer to the first possible edge g E f so that {Ii,g) may 
be essential; whenever we find Jj C Ii,'we inerease this pointer after the last edge of 
Jj . Under this scenario if the pointer of f is equal to / after proeessing all Jj staring 
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Fig. 5. An instance of a data structure (with solid arrows as pointers drawn only in one direction) 
for the subset-edge pairs of £ where the last edge over subpath wraps around. Light shaded 
elements all cross some of (/i, e) fori = 1,2, 3, 4 (dark shaded); for these elements dotted arrows 
indicate one of their marks after deletion. 



at /, then (Jj, /) is essential: we insert it into the data structure and increase the pointer 
to the next edge. Insertion takes 0(1) time since {U, /) is always added to the end of 
both linked lists (apart from the effect of the wraparound over cycle C that can easily 
be handled as an exception). Hence Phase 1 takes 0{nm) time. 

Phase 2. We are able to very efficiently construct the cross-free set /C by choosing a 
specific order for (J, e) in the inner loop of Phase 2: we select each edge e in a cyclic 
order and then read the column list of e containing all (J, e). In order to find all (J, /) 
crossing (J, e), we take advantage of Frank’s Theorem 2 that claims ( J, e) may not have 
been deleted prior to ( J, /). Hence all ( J, /) to be deleted can be found by parsing the 
row lists of elements ( J, e) found on the column list of e. 

Our next idea is to process all (J, e) for a fixed I in parallel. We visit the row lists 
starting at each ( J, e) in both directions. It is easy to see that the elements to be deleted 
form consecutive sequences over these lists; it is also not hard to determine which {I, e) 
causes the deletion of a certain ( J, /). Deletions from the lists take 0(1) time; we spend 
an additional 0 ( 1 ) time to determine that no more elements should be deleted from a 
row list. Hence we use 0(|^ | + n^) time where an easy bound on |^ | is n^. Phase 2 thus 
takes O(n^) time. 

Phase 3. In order to use an algorithm for Dilworth’s theorem, we have to construct all 
cover pairs (the Hasse diagram) of the poset /C. We consider row lists for subpaths I in 
the data structure for 1C; we proceed with the / so that the first edge of / follow the cyclic 
order. For two consecutive list members (J, ei) and (/, 62 ) all covers ( J, /) for (J, ei) 
satisfy that / is clockwise between ei and 62 ; for such an / we get (J, /) >- (J, ei) 
only if ( J, /) is the last in the column list of / with J preceding I. By maintaining the 
value of these last ( J, /) for each /, we are able to find the possible cover ( J, /) for a 
fixed / in 0 ( 1 ) time; thus the total time spent for all subpaths starting at a given edge is 
proportional to the length of the longest such path. This gives a running time of O(n^) 
for this phase, in addition to the time of our Dilworth subroutine of Section 3. 

Phase 4. Assume we delete ( J, /) because of (J, e) in Phase 2; it gets the labels ( J, e) 
and (J, /). By Lemma 2 if we know the generators for these latter two elements, then 
we may adjust the generator set for each non-generated ( J, /) in 0(1) time. Since the 
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generator set keeps changing, we still need an efficient way to update the generators for 
all generated elements. 

Let us consider the column list ( J, e) of a fixed edge e. Then all (iT, g) keep the 
endpoint of the generator if g is right from e on subpath K and keep the starting point 
otherwise (see Fig. 3). Thus in this case we may find all generators in 0(1) time if we 
maintain their set doubly sorted. Since we take the e in anti-clockwise order (opposite 
of Phase 2), we only need to update the generator set for the current column list ( J, e) ; 
this takes 0(n) time if we use counting sort [3] to sort the generator set. All remaining 
work can be bounded in the same way as in Phase 2; the running time totals to O(n^). 

3 Algorithms for Dilworth’s Problem 

In this section we give various algorithms for Dilworth’s problem where the running 
time depends on q, the size of the cover relation or the Hasse diagram of the poset. For a 
poset of p elements, the number of comparable pairs typically satisfy q. We give a 
simple 0 (( 7 p)-time algorithm that we use in our implementation; we give a randomized 
0(gY^pTo^)-time' one that applies to special posets only but uses no elaborate data 
structures (this algorithm might outperform our implementation in practice); finally we 
give a general deterministic one with the same time bound but using the Sleator-Tarjan 
dynamic tree data structure [15] (this algorithm is likely of little practical utility [1]). 

The only previous known algorithm for Dilworth’s problem is a straightforward 
reduction to a bipartite matching problem. We make two copies /Ci and /C 2 of the ground 
set 1C of the poset and for each pair x < y of K- add an edge between the copy of x in ICi 
and the copy of ?/ in /C 2 . Then the edges of a maximal matching in this graph determine 
a minimal chain partition of the poset. For a poset of p members and q comparable 
pairs, we may thus find a chain partition in 0{pq) time by the alternating path matching 
algorithm [3] or in 0{q^) time by the Hopcroft-Karp matching algorithm [10]. 

Our first algorithm modifies the basic alternating path bipartite matching algorithm 
when one augments a matching M along a path whose edges are alternately inside and 
outside M. We extend the notion of an alternating path such that between two edges of 
M we allow arbitrary sequences of covers instead of a single (transitive) pair. This type 
of alternating paths can be found by breadth-first search (BFS )[3] in the graph with 
vertices /Ci and IC 2 whose edges are (i) all matching edges directed from IC 2 towards 
/Ci; (ii) all covers x -< y with x £ ICi and y <E IC 2 ; and (iii) in addition all covers x <y 
with both x,y <E ICi. Since the original algorithm also finds augmenting paths by BFS, 
we simply replace the g by g in the running time bound. 

We heuristically improve our matching algorithm by identifying all possible edge- 
disjoint augmenting paths of the same breadth-first tree. In our experiments we never 
required significantly more than rounds of BFS to identify all augmenting paths, 
indicating that the experimental running time stays around q^/p. However for the hard 
matching instances arising from extreme sparse (n « 2m) inputs it often took around 10 
rounds of BFS to find all augmenting paths of the same length — a task completed in 0(g) 
time by the Hopcroft-Karp algorithm. Our next algorithms hence might outperform this 
implementation. 

' Further slight improvements in the log factors are achieved in the full paper [2] 
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It is not straightforward to modify the Hopcroft-Karp matching algorithm. That 
algorithm finds shortest alternating paths by a depth-first search [3] of all edges along 
which the distance to an unmatched vertex decreases. If this distance increases over 
then it switches to the basic alternating path algorithm. The running time 0{q^) arises 
from the following two claims with d = 

Lemma 4 (Hopcroft and Karp [10]). At certain stage of the Hopcroft-Karp matching 
algorithm, let the length of a shortest augmenting path be d. Then (i) the total number 
of times we backtracked along a given edge in depth-first search is at most d; this totals 
to 0{qd) for all edges; and (ii) the size of the maximum matching may not exceed the 
size of the current matching by more than p/d. 

When simulating steps over comparable pairs by several steps over covers, the main 
difficulty arises in (i) of the lemma: the same cover may have to be traversed for several 
distinct comparable pairs and the bound 0{qd) for the number of backtracks will not 
hold. Instead we will show that the average time spent for simulating a comparable pair 
is O(logp); then we may set d = y/p/ logp to get a running time of 0{qfp\ogp). 

We sketch two algorithms to simulate a single comparable pair by covers in an 
average O(logp) time. For details see our full paper [2]. The first idea is to maintain 
fragments of cover paths so that whenever a new vertex is reached, we may jump to the 
end of a path passing through. The Sleator-Tarjan dynamic tree data structure [15] is 
capable of performing all necessary operations in O(logp) time. 

A simpler randomized algorithm can be given for posets with the property that no 
two elements may be connected by two distinct paths in the Hasse diagram. The posets in 
Frank’s algorithm satisfy this property. In such a poset we select edge xy out of a certain 
vertex a: € /Ci as follows. Whenever possible, we select y £ IC 2 such that there is a 
matching edge yz; we will then complete the simulation of a comparable pair. Otherwise 
if we have a unique choice of vertex y, we contract x into y (we use a Union-Find data 
structure [3]). In all remaining cases we have at least two covers leading out of x; we 
choose one of them at random. By the special property of the poset, we take O(logp) 
such steps before reaching a matching edge, with high probability. 

4 Bounds for the Size of Poset K 

The central theorems in our analysis are non-trivial bounds on the vertex and the edge size 
of poset K, obtained in Phase 3 . We show that the poset has 0{n log n) vertices and 0{'n?) 
comparable pairs. These results are tight within small constant multiplicative factors 
(instances for tightness are in Figs. 4 and 6). By using the Hopcroft-Karp algorithm 
[10] for Dilworth’s problem (with running time 0{qyf>) for a poset of p elements 
and q comparable pairs), we may hence derive an 0{n‘^'^flogn) running time bound; 
preprocessing requires 0(m log n) time in addition. 

Theorem 5. Let T{n) denote the maximum number of essential subpath-edge pairs over 
C that contain no two elements that cross. Then T{n) = 0(nlogn). 

Theorem 6. There are 0{'n?) eomparable and T2{'nf) cover pairs of elements in IC. 
Hence the edge size of the poset of K, is 0{rf). 
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Fig. 6. Any subset-edge pair of the upper shaded area of n/4 elements is a cover of any other 
element of the lower shaded area of another n/4 elements. Hence the cover relation of the poset 
has size I?(n^). 



Unlike our experiments indicating that there are significantly less cover pairs than 
comparable pairs, our theoretic running time bound is not based on our cover edge 
algorithms of Section 3 . We were not able to derive better bounds for cover edges: while 
there are 0{n^) transitive edges, the instance in Fig. 6 shows a matching bound 

for the cover edges themselves. 

Both proofs apply a divide-and-conquer technique similar to that of the preprocessing 
step in Section 2. For the sake of simplicity we consider subpaths of a path P; the case 
of a length n cycle can be reduced to a length 2n path by doubling the edges. We 
subdivide the path P into Pi and P2, the first and last n/2 edges of P. For Theorem 5 
it suffices to prove that there are 0{n) elements (/, e) € /C such that / intersects both 
Pi and P2. Similarly for Theorem 6 it suffices to show that there are O(n^) pairs of 1C 
with (J, e) < ( J, /) and I ox J intersecting both Pi and P2. The second claim follows 
relatively easy; we leave the proof for the full paper [2]. 

For Theorem 5 next we show the 0{n) bound. We immediately set aside those 0{n) 
elements (J, /) € /C where / is the first edge of /. Thus it suffices to show that the 
cardinality of fC\ = {(/, /) € /C : / contains the last edge of Pi and / is not the first 
edge of /} is 0(n). 

Let Pi = {d, 62 , . . . , Cfc}. By the definition of essential the following mapping is 
one-to-one: we map (/, /) G /Ci to pairs (i, j) with i < j where the first edge of / is 
and / = Cj. We will use (i,j) as shorthand names of subpath-edge pairs (J, /). By the 
definition of essential and non-crossing we have: 

Claim. There are no pairs (i, j) and € /Ci with i' < i < j' < j. □ 

Notice that if (i,j) G /Ci , then no pair {s,j') for > j maybe in /Ci by the claim. 
Flence {i,j) “blocks” all elements (s, j'). Since j < |Pi|, we are done by giving an 
assignment of the elements {i,j) G /Ci to values s < j blocked by {i,j) such that no 
two elements {i,j) and are mapped to the same value. 

We achieve an assignment as above by ordering all {i,j) by the increasing value of 
j — i; we break ties by the increasing value of i. Then for all {i,j) we assign the smallest 
value s > i not assigned to any other element of /Ci preceding 
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To show the correctness of the assignment, by contradiction let {i,j) be the first 
element in order where the assignment of a value s < j is not possible. Since the value 
i + l satisfies i < i + 1 < in particular i + 1 is assigned to some {i', j') that precedes 
{i,j) in the ordering. Then 

i' < i + l < j', i.e. i' <i < j' and (1) 

f -i' <j -i , (2) 

where (1) is since {i' ,j') is not a counterexample by the minimal choice of (i, j); and 
(2) is by the definition of the ordering. Since (i, j) and {i' ,j') are different and i' < i, 
from (2) we get j' < j. If we assume i' < i (strict inequality holds in (1)), the previous 
observations are in contradiction with the Claim. Hence we conclude that i' = i, i.e. 
there is an € /Ci withj' < j. Let be maximum among all elements {i,j”) € /Ci 
with j" < j. 

Consider now the value j' + 1 < j for j' defined above; by the assumption that 
j' < j and hence j' + 1 < j the value j' + 1 is assigned to some {%" ,j") € /Ci with 
(again) 

i” < f < f (3) 

f -i" <j-i. (4) 

We distinguish three cases by comparing i and i" . If i” = i, then j” < j by (4) and 
j' < j" contradicts with the maximal choice of j' . If i” < i, then f < j by (4); 
i < j' + 1 since {i,j') is a pair and a; < y for all pairs {x, y); finally j' < j” by (3), 
adding up to i” < i < j” and j” < j, contradicting the Claim. In the last case i < i" , 
when i <i" < j' and j' < j", both following by (3); the choice of {i,j') and 
contradicts the Claim again. 

5 Knuth’s Algorithm vs. Ours: Performance Tests 

Our experiments were conducted on a SPARC- 10 with 160M memory. We calibrated the 
machine as designed by the organizers of the First DIMACS Implementation Challenge 
[11]; we resulted the following user times in seconds (no optimization . . . optimization 
level 4): first test 0.5, 0.3, 0.3, 0.2, 0.3; second test 4.6, 3.0, 2.9, 3.0, 3.0. In the table rir 
and nir are the new input size after the reductions are applied; p and q are the poset sizes, 
n, Ur, m, rtir and q are in thousands; q is in millions. ‘Knuth” denotes the user time in 
seconds for [13] with our initial reduction applied; “Frank” denotes the user time of our 
algorithm. We ran each test twice with two different seeds; we observed no variation 
(time, Ur, nir, p or q) exceeding 5%. 

Our implementation performs well on relative dense inputs (the maximum density is 
m = 0{n log n) due to the initial reduction); here Knuth’s algorithm ran out of memory 
for the largest instance. The Dilworth instances are easy or completely trivial; matching 
takes 10% of user time. In contrast our memory consumption is high for sparse inputs 
and Knuth’s algorithm performs much better if the density is extreme low (n « 2n; 
the 2m first and last edges for the set of subpath are distinct). Here our implementation 
spends 90% of time for matchings and take slightly more than BFS-steps; a better 
matching algorithm might be needed here. 
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6 Conclusion 



We implemented and analyzed a new efficient 0{n‘^'^^/logn + m log n) time algorithm 
to find the minimum number of so-called generators of a set of subpaths of a cycle. 
Previous implementations are only able to handle the case of subpaths of a path instead 
of a cycle. We compared our implementation (restricted to paths) to Knuth’s [13] previous 
algorithm. While by our results Knuth’s algorithm is faster in theory, our experiments 
with random input indicate the contrary for all but the extreme sparse problems. As of 
own interest, our experiments resulted in new algorithms for Dilworth’s theorem. 
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Abstract. Let be a laminar family of subsets of a groundset V. A k-cover of 
H is a multiset C of edges on V sueh that for every subset S' in C* has at least 
k edges that have exactly one end in S. A k-packing of is a multiset P of edges 
on V such that for every subset S in Tl, P has at most k ■ u{S) edges that have 
exactly one end in S. Here, u assigns an integer capacity to each subset in P. 
Our main results are: (a) Given a fc-cover C of P, there is an efficient algorithm 
to find a 1-cover contained in C of size < fc|C'|/(2fc — 1). For 2-covers, the 
factor of 2/3 is best possible, (b) Given a 2-packing P of P, there is an efficient 
algorithm to find a 1-packing contained in P of size > iHj/S. The factor of 1/3 
for 2-packings is best possible. 

These results are based on efficient algorithms for finding appropriate colorings 
of the edges in a fe-cover or a 2-packing, respectively, and they extend to the 
case where the edges have nonnegative weights. Our results imply approximation 
algorithms for some NP-hard problems in connectivity augmentation and related 
topics. In particular, we have a 4/ 3-approximation algorithm for the following 
problem: Given a tree T and a set of nontree edges E that forms a cycle on the 
leaves of T, find a minimum-size subset E' of E such that T + if ' is 2-edge 
connected. 



1 Introduction 

Let be a laminar family of subsets of a groundset V . In detail, let V be a groundset, 
and let P = {S\,S 2 , . . . , 5^} be a set of distinct subsets of V sucb that for every 
1 < hi < q,Sir\Sj is exactly one of 0, S'i or S'j. A /c-cover of 7/ is a multiset of edges, 
C, sucb that for every subset S inP, C bas at least k edges (counting multiplicities) 
that bave exactly one end in S. A k-packing of 7/ is a multiset of edges, P, sucb that for 
every subset S mP, P bas at most k ■ u{S) edges (counting multiplicities) that bave 
exactly one end in S. Here, u assigns an integer capacity to each subset in P. Our main 
results are: 
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1. Given a /c-cover C of there is an efifieient algorithm to find a 1-eover eontained 
in C of size < k\C\/ {2k — 1). For 2-covers, the factor of 2/3 is best possible. 

2. Given a 2-packing P of %, there is an efficient algorithm to find a 1-packing con- 
tained in P of size > If’l/S. The factor of 1/3 is best possible. 

All of these results extend to the weighted case, where the edges have nonnegative 
weights. Also, we show that the following two problems are NP-hard: (1) Given 

a 2-cover C of T-i, find a minimum-size 1 -cover that is contained in C. (2) Given a 

2- packing P of u, find a maximum-size 1 -packing that is contained in P. 

The upper bound of 2/3 on the ratio of the minimum size of a 1 -cover versus the 
size of a (containing) 2-cover is tight. To see this, consider the complete graph it's, and 
the laminar family T-i consisting of three singleton sets. Let the 2-cover be E{K^). A 
minimum 1 -cover has 2 edges from K^. The same example, with unit capacities for the 
three singleton sets in %, shows that the ratio of the maximum size of a 1 -packing versus 
the size of a (containing) 2-packing may equal 1/3. There is an infinite family of similar 
examples. 

An edge is said to cover a subset S' of G if the edge has exactly one end in S. Our 
algorithm for finding a small-size 1 -cover from a given 2-cover constructs a “good” 

3- coloring of (the edges of) the 2-cover. In detail, the 3-coloring is such that for every 
subset S in the laminar family, at least two different colors appear among the edges 
covering S. The desired 1 -cover is obtained by picking the two smallest (least weight) 
color classes. Similarly, our algorithm for finding a large-size 1 -packing from a given 2- 
packing constructs a 3-coloring of (the edges of) the 2-packing such that for every subset 
S in the laminar family, at most u{S) of the edges covering S have the same color. The 
desired 1 -packing is obtained by picking the largest (most weight) color class. 

1.1 A Linear Programming Relaxation 

Consider the natural integer programming formulation (IP) of our minimum I -cover 
problem. Let the given /c-cover be denoted by E. There is a (nonnegative) integer variable 
Xe for each edge e e E. For each subset S e T-i, there is a constraint Xe > 1, 

e6<5(S) 

where S{S) denotes the set of edges covering S. The objective function is to minimize 
'^^WeXe, where We is the weight of edge e. Let (LP) be the following linear program 

e 

obtained by relaxing all of the integrality constraints on the variables. 

(LP) zlp = min '^^WeXe s.t. { Xe > 1, yS e Xe> 0, Ve e E}. 

e e€S{S) 

Clearly, (LP) is solvable in polynomial time. The /c-cover gives a feasible solution to 
(LP) by fixing Xe = 1/A: for each edge e in the k-covez 

For the minimum 1 -cover problem. Theorem 3 below shows that the optimal value of 
the integer program (IP) is < 4/3 times the optimal value of a half- integral solution to the 
LP relaxation (LP). (A feasible solution x to (LP) is called half-integral if Xe € {0, | , 1 }, 
for all edges e.) There are examples where the LP relaxation has aunique optimal solution 
that is not half-integral. For the maximum 1 -packing problem. Theorem 6 shows that the 
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optimal value of the integer program is > 2/3 times the optimal value of a half-integral 
solution to the LP relaxation. 

Recall that a laminar family % may be represented as a tree T = T{%). (T has a 
node for V as well as for each set e 7i, and T has an edge A^Aj if Aj e {VjuTi 
is the smallest set containing Ai G %.) 

Two special cases of the minimum 1 -cover problem are worth mentioning, (i) If 
the laminar family T-i is such that the tree T{TL) is a path, then the LP relaxation has an 
integral optimal solution. This follows because the constraints matrix of the LP relaxation 
is essentially a network matrix, see [CCPS 98, Theorem 6.28], and hence the matrix is 
totally unimodular; consequently, every extreme point solution (basic feasible solution) 
of the LP relaxation is integral, (ii) If the laminar family TL is such that the tree T (Tf ) is 
a star (i.e., the tree has one nonleaf node, and that is adjacent to all the leaf nodes) then 
the LP relaxation has a half-integral optimal solution. This follows because in this case 
the LP relaxation is essentially the same as the linear program of the fractional matching 
polytope, which has half-integral extreme point solutions, see [CCPS 98, Theorem 6.13]. 

1.2 Equivalent Problems 

The problem of finding a minimum 1 -cover of a laminar family % from among the 
multiedges of a /c-cover E may be reformulated as a connectivity augmentation problem. 
Let T = T{'H) be the tree representing 7i; note that E{T) is disjoint from E. Then the 
problem is to find a minimum weight subset of edges E' contained in E such that 
T + E' ~ (V{T), E{T) U E') is 2-edge connected; we may assume that E' has no 
multiedges. Instead of taking T to be a tree, we may take T to be a connected graph. This 
gives the problem CBRA which was initially studied by Eswaran & Tarjan [ET 76], and 
by Frederickson & Ja’ja’ [FJ 81]. 

Similarly, the problem of finding a maximum 1 -packing of a capacitated laminar fa- 
mily T-i, u from among the multiedges of a A: -packing E may be reformulated as follows. 
Let T = T(?f) be the tree representing H, and let the tree edges have (nonnegative) 
integer capacities u : £1(T)—^Z; the capacity ofa set G corresponds to the capacity 
of the tree edge representing Ai. The A: -packing E corresponds to a set of demand 
edges. The problem is to find a maximum integral multicommodity flow x : E^Z 
where the source-sink pairs (of the commodities) are as specified by E. In more detail, 
the objective is to maximize the total flow subject to the capacity constraints, 

namely, for each tree edge ai the sum of the a; -values over the demand edges in the cut 
given by T — ai is < u(ai), and the constraints that x is integral and > 0. 

1.3 Approximation Algorithms for NP-hard Problems in Connectivity 
Augmentation 

Our results on 2-covers and 2-packings imply improved approximation algorithms for 
some NP-hard problems in connectivity augmentation and related topics. Frederickson 
and Ja’ja’ [FJ 81] showed that problem CBRA is NP-hard and gave a 2-approximation 
algorithm. Later, Khuller and Vishkin [KV 94] gave another 2-approximation algorithm 
for a generalization, namely, find a minimum-weight A:-edge connected spanning sub- 
graph of a given weighted graph. Subsequently, Garg et al [GVY 97, Theorem 4.2] 
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showed that problem CBRA is max SNP-hard, implying that there is no polynomial- 
time approximation scheme for CBRA modulo the Py^NP conjecture. Currently, the 
best approximation guarantee known for CBRA is 2. 

Our work is partly motivated by the question of whether or not the approximation 
guarantee for problem CBRA can be improved to be strictly less than 2 (i.e., to 2 — e for 
a constant e > 0). We give a 4/3 -approximation algorithm for an NP-hard problem that 
is a special case of CBRA, namely, the tree plus cycle (TPC) problem. See Section 4. 

Garg, Vazirani and Yannakakis [GVY 97] show that the above maximum 1 -packing 
problem (equivalently, the above multicommodity flow problem) is NP-hard and they 
give a 2-approximation algorithm. In fact, they show that the optimal value of an integral 
1-packing zjp is > 1/2 times the optimal value of a fractional 1-packing zpp. We do 
not know whether the factor 1/2 here is tight. 

It should be noted that the maximum 1 -packing problem for the special case of unit 
capacities (i.e., u{Ai) = 1, \/Ai € %) m polynomial-time solvable. If the capacities are 
either one or two, and the tree T{'H) representing the laminar family % has height two 
(i.e., every tree path has length < 4), then the problem may be NP-hard, see [GVY 97, 
Lemma 4.3]. 

Further discussion on related topics may be found in the survey papers by Frank 
[F 94], Hochbaum [Floe 96], and Khuller [Kh96]. Jain [J 98] has interesting recent 
results, including a 2-approximation algorithm for an important generalization of pro- 
blem CBRA. 

We close this section by introducing some notation. For a multigraph G = {V,E) 
and a node set S C V, let Se{S) denote the multiset of edges in E that have exactly one 
end node in S, and let dpiS) denote |(i£;(S')|; so dpiS) is the number of multiedges in 
the cut (S', V — S). 

2 Obtaining a 1-Cover from a fc-Cover 

This section has our main result on /c-covers, namely, there exists a 1 -cover whose size 
(or weight) is at most k/{2k — 1) times the size (or weight) of a given /c-cover. The 
main step (Proposition 2) is to show that there exists a “good” {2k — 1) -coloring of any 
/c-cover. We start with a preliminary lemma. 

Lemma 1. Let V be a set of nodes, and let TL be a laminar family on V. Let E be a 
minimal k-cover ofTi. Then there exists a set X e T-L such that dpiX) = k and no 
proper subset Y of X is in %. 

Proof. Since E is minimal, there exists at least one set Y £ ft with dpiX) = k. We 
call a node set Y C V a tight set if d^(Y) = k. Consider an inclusionwise minimal 
tight set Y in T-i. Suppose there exists a Y c Y such that Y G Tf. If each edge of E 
that covers Y also covers Y, then we have dpiY) = k. But this contradicts our choice 
of Y. Thus there exists an edge xy & E covering Y with x,y G Y. By the minimality 
of E, xy must cover a tight set Z ^T-L. Since H is a laminar family, Z must be a proper 
subset of Y. This contradiction to our choice of Y proves the lemma. □ 
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Proposition 2. Let V be a set of nodes, and let T-Lbe a laminar family on V. Let E be a 
minimal k-cover offL. Then there is a {2k — l)-coloring of (the edges in) E sueh that 

(i) each set X eTL is covered by edges of at least k different colors, and 

(ii) for every node v with cIe{v) < k, all of the edges ineident to v have distinct colors. 

Proof. The proof is by induction on \T-L\. For 1 = 1 the results holds since there are k 
edges in E (since E is minimal) and these can be assigned different colors. (For 1 = 0, 

\E\ = 0 so the result holds. Flowever, even if E is nonempty, it is easy to color the edges 
in an arbitrary order to achieve property (ii).) 

Now, suppose that the result holds for laminar families of cardinality < N. Consider 
a laminar family TL of cardinality + 1, and let E he a minimal k-cover of Ti. By 
Lemma 1, there exists a tight set A e PL (i.e., <1e{A) = k) such that no F c is in 
PL. We contract the set A to one node va, and accordingly update the laminar family 
PL. Then we remove the singleton set from PL. Let the resulting laminar family be 
PL' , and note that it has cardinality N . Clearly, E is a k-eower of PL' . Let E' ^ E he a 
minimal k-eower ohPL' . By the induction hypothesis, E' has a {2k — l)-coloring that 
satisfies properties (i) and (ii), i.e., E' has a good {2k — 1) -coloring. 

If the node va is incident to > k edges of E', then note that E' with its {2k — 1)- 
coloring is good with respect to PL (i.e., properties (i) and (ii) hold for PL too). To see 
this, observe that k < d,E'{vA) < dE{vA) = k, so d,E'{vA) = k, hence, the k edges of 
E' incident to Va get distinct colors by property (ii). Then, for the original node set V, 
the k edges of E' covering A get k different colors. 

Now focus on the case when d,E’ {va) < k. Clearly, each edge in E — E' is incident 
to Va, since each edge in E not incident to va covers some tight set that is in both PL 
and PL'. We claim that the remaining edges of E — E' incident to va can be colored and 
added to E' in such a way that E with its {2k — 1) -coloring is good with respect to PL. 

It is easy to assign colors to the edge (or edges) of E — E' such that the k edges 
of E incident to Va get different colors. The difficulty is that property (ii) has to be 
preserved, that is, we must not “create” nodes of degree < k that are incident to two 
edges of the same color. It turns out that this extra condition is easily handled as follows. 
Let e <E E — E' he an edge incident to va, and let w e F be the other end node of e. 
If w has degree < k for the current subset of E, then e is incident to < {2k — 2) other 
edges; since {2k — 1) colors are available, we can assign e a color different from the 
colors of all the edges incident to e. Otherwise (w has degree > k for the current subset 
of E), the other edges incident to w impose no coloring constraint on e, and we assign 
e a color different from the colors of the other edges incident to va', this is easy since 
dE{VA) = k. □ 

Theorem 3. Let V be a node set, and let PL be a laminar family on V. Let E be a k-eover 
of PL, and let each edge e <E E have a nonnegative weight w{e). Then there is a 1-cover 
of PL, call it E', such that E' C E and w{E') < k w{E) / {2k — 1). Moreover, there is an 
efficient algorithm that given E finds E'; the running time is 0{min{k\V\^ , A:^|F|)). 

Proof. We construct a good (2/c — 1) -coloring of the k-cower E by applying Proposition 2 
to a minimal k-eower E (1 E and then “extending” the good {2k — 1) -coloring of E to 
E. That is, we partition E into {2k — 1) subsets such that each set X in H is covered by 
edges from at least k of these subsets. We take E' to be the union of the cheapest k of 
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the {2k — 1) subsets. Clearly, the weight of E' is at most k/ {2k — 1) of the weight of 
E, and (by property (i) of Proposition 2) E' is a 1-cover ofTi.. 

Consider the time complexity of the construction in Proposition 2. Let n = \V\; 
then note that \T-L\ < 2n and \E\ < 2kn. The construction is easy to implement in time 
0{\H\ ■ |£^|) = 0{kri^). Also, for k < n, the time complexity can be improved to 
0{k'^ ■ \T-L\) = 0{k‘^n). To see this, note that for each set A e we assign colors to at 
most k of the edges covering A after we contract A to va, and for each such edge e we 
examine at most {2k — 2) edges incident to e. □ 

3 Obtaining a 1-Packing from a 2-Packing 

This section has our main result on 2-packings, namely, there exists a 1 -packing whose 
size (or weight) is at least 1/3 times the size (or weight) of a given 2-packing. First, we 
show that there is no loss of generality in assuming that the 2-packing forms an Eulerian 
multigraph. Then we give a 3-coloring for the edges of the 2-packing such that for each 
set S in the laminar family at most u{S) edges covering S have the same color. We take 
the desired 1 -packing to be the biggest color class. 

Lemma 4. Let V be a set of nodes, let fL be a laminar family on V, and let u : fL^Z 
assign an integral capacity to each set in T-L. Let E be a 2-packing of Li, u, i.e., for 
all sets Ai G T-L, d,E{Ai) < 2u{Ai). IfE is a maximal 2-packing, then the multigraph 
G = {V, E) is Eulerian. 

Proof. If G is not Eulerian, then it has an even number (> 2) of nodes of odd degree. Let 
A e {V} ULi be an inclusion wise minimal set that contains > 2 nodes of odd degree. For 
every proper subset S' of A that is in ft and that contains an odd-degree node, note that 
d E (S) is odd, hence, this quantity is strictly less than the capacity 2u{S). Consequently, 
we can add an edge (or another copy of the edge) vw where u, zu are odd-degree nodes 
in A to get E U {uzu} and this stays a 2-packing of u. This contradicts our choice of 
E, since is a maximal 2-packing. Consequently, G has no nodes of odd degree, i.e., 
G is Eulerian. □ 

Proposition 5. Given an Eulerian multigraph G = {V, E), an arbitrary pairing V of 
the edges such that for every edge-pair the two edges have a common end node, and a 
laminar family of node sets PL, there is a 3-coloring of E such that 

(i) for each cut 5E{Ai), Ai e PL, at most half of the edges have the same color, and 

(ii) for each edge-pair e, f in V, the edges e and / have different colors. 

Proof. Let P be a set of triples [u, e, /], where e and / are paired edges incident to the 
node V. Note that an edge e = vw may occur in two triples [u, e, /] and [w,e,g]. W.l.o.g. 
assume that V gives, for each node v, a pairing of all the edges incident to v. Then P 
partitions E into one or more (edge disjoint) subgraphs Qi,Q 2 ,- ■ ., where each subgraph 
Qj is a connected Eulerian multigraph. To see this, focus on the Eulerian tour given by 
fixing the successor of any edge e = vw to be the other edge in the triple [zu, e, /] e V, 
assuming e is oriented from v to w; each such Eulerian tour gives a subgraph Qj. 

If PL = $, then we color each subgraph Qj with 3 colors such that no two edges in 
the same edge-pair in P get the same color. This is easy: We traverse the Eulerian tour 
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of Qj given by V, and alternately assign the colors red and blue to the edges in Qj, and 
if necessary, we assign the color green to the last edge of Qj . 

Otherwise, we proceed by induction on the number of sets in %. We take an inclu- 
sionwise minimal set A eTi, shrink it to a single node Va, and update G = (F, E), H 
and V to G' = {V\ E'), %' and V. Here, 'H' = 'H — {A}, i.e., the singleton set 
is not kept in H' ■ Also, we add new edge pairs to V to ensure that all edges incident to 
va are paired. For a node v ^ A, all its triples [u, e, f] E V are retained in V' . Consider 
the pairing of all the edges incident to Va in G' . For each triple [u, e, /] in V such that 
V G A and each of e, / has one end node inV — A (so e, / are both incident to va in G'), 
we replace the triple by [va, e, /]. We arbitrarily pair up the remaining edges incident 
to Va in G' . 

By the induction hypothesis, there exists a good 3-coloring for G' , T-L' , V . It remains 
to 3-color the edges with both ends in A. For this, we shrink the nodes in H — A to a single 
node Us, and update G = {V,E),V,'H,toG" = {V” , E”),V" notethatH" is the 
empty family and so may be ignored. We also keep the 3-coloring of 6 e'{v a) = 6e"{vb)- 
Our final goal is to extend this 3-coloring to a good 3-coloring of E" respecting V” ■ 
We must check that this can always be done. Consider the differently-colored edge pairs 
incident to Vb- Consider any connected Eulerian subgraph Qj containing one of these 
edge pairs 61 , 62 ; the corresponding triple in V" is , 61 , 62 ] . Let Qj be a minimal walk 
of (the Eulerian tour of) Qj starting with 62 and ending with an edge / incident to us 
(possibly, / = 61 ). The number of internal edges in Qj is = 0 or 1 (mod 2), and the 
two terminal edges either have the same color or not. If the number of internal edges in 
Qj is nonzero, then it is easy to assign one, two, or three colors to these edges such that 
every pair of consecutive edges gets two different colors. The remaining case is when Qj 
has no internal edges, say, Qj = VB,e 2 ,w, f, vb, where ru is a node in A. Then edges 
62 , / are paired via the common end-node w, i.e., the triple [ru, 62 , /] is present in both 
V" and V. Then, by our construction of V from V, the triple [va, 62 , /] is in V' , and 
so edges 62 and / (which are paired in V' and present in 5e' (va) = Se" (vb)) must get 
different colors. Hence, a good 3-coloring of G' ,V' can always be extended to give 
a good 3-coloring of Qj, and the construction may be repeated to give a good 3-coloring 
of Qj. 

Finally, note that E" is partitioned by V" into several connected Eulerian subgraphs 
Qi, Q 2 , • • •, where some of these subgraphs contain edges of 6e"{vb) and others do 
not. Clearly, the good 3-coloring of G' , , V can always be extended to give a good 

3-coloring of each of Qi, Q 2 , • • •, and thus we obtain a good 3-coloring of G, T-L, V. □ 

Theorem 6. Let V be a node set, let % be a laminar family on V, and let u : 
assign an integer capacity to each set in T-L. Let E be a 2-packing of Li, and let each 
edge e e E have a nonnegative weight w{e). Then there is a 1-packing of T-L, call it E', 
such that E’ C E and w{E') > w{E)/3. Moreover, there is an efficient algorithm that 
given E finds E'; the running time is 0(|H| • |£^|). 

Proof. If the multigraph {V, E) is not Eulerian, then we use the construction in Lemma 4 
to add a set of edges to make the resulting multigraph Eulerian without violating the 
2-packing constraints. We assign a weight of zero to each of the new edges. Let us 
continue to use E to denote the edge set of the resulting multigraph. We construct a good 
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3-coloring of the 2-packing E by applying Proposition 5. Let F be the most expensive 
of the three “color classes;” so, the weight of F, w{F), is > w{E)/3. Note that F is 
a 1-packing of H, u by property (i) in the proposition since for every set Ai e Ti, we 
have dp{Ai) < dE{Ai)/2 < 2u{Ai). Finally, we discard any new edges in F (i.e., the 
edges added by the construction in Lemma 4) to get the desired 1 -packing. 

Consider the time complexity of the whole construction. It is easy to see that the 
construction in Proposition 5 for the minimal set 2l € takes linear time. This con- 
struction may have to be repeated \H\ = 0(|I^|) times. Hence, the overall running time 
isO(|H|-|£;|). □ 

4 Applications to Connectivity Augmentation and Related Topics 

This section applies our covering result (Theorem 3) to the design of approximation 
algorithms for some NP-hard problems in connectivity augmentation and related topics. 
The main application is to problem CBRA, which is stated below. Problem CBRA is 
equivalent to some other problems in this area, and so we immediately get some more 
applications. 

Recall problem CBRA: given a conneeted graph T = (V, F), and a set of “supply” 
edges E with nonnegative weights w : i?— the goal is to find a minimum-weight 
subset E' of E such that T + E' = {V, F U E') is 2 -edge-connected. One application of 
Theorem 3 is to give a 4/3-approximation algorithm for the special case of CBRA when 
the LP relaxation has an optimal solution that is half-integral. 

Theorem 7. Given a half-integral solution to the LP relaxation of CBRA of weight z, 
there is an 0{\V\)-time algorithm to find an integral solution (i.e., a feasible solution of 
CBRA) whose weight is < | z- 

Proof. Problem CBRA may be restated as the problem of finding a minimum-weight 
1 -cover of a laminar family Ti, where the 1 -cover must be chosen from the set of supply 
edges E and each supply edge has a nonnegative weight. To specify H, fix any node 

r G H to be the root of T, and focus on the cut edges of T, call them fi, f 2 , For 

each of these cut edges /i, / 2 , . . ., let be the (node set of the) component of T — f 
that does not contain r. We take PL = {Hi, H 2 , . . .}. 

Let X : E^\0, 1} be a half-integral solution to the LP relaxation of CBRA, and 

let 2 = '^eXe- Then x corresponds to a 2-cover C of PL, where C has zero, one or 
two copies of a supply edge e iff Xe = 0, 1, or 2. By Theorem 3, C contains a 1-cover 
C whose weight is < 4z/3, and moreover, C can be computed in time 0(|H|). □ 

We have sharper results for the following (NP-complete) special case of problem CBRA. 
Tree Plus Cycle Problem (TPC) : 

INSTANCE: A tree T = {W, F) whose set of leaf nodes is V C W, a “supply” cycle 
Q = (V, E) on the leaves of T (i.e., dE{v) = 2, Vu € V), and a positive integer N. 
QUESTION: Is there a set of edges E' C E with \E'\ < N such that T + E' = 
{W, F U E') is 2-edge-connected? 
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Corollary 8. There exists a ^-approximation algorithm for TPC. Moreover, there exists 
a feasible solution E' C E{Q) of size < 2|C(Q)|/3. 

Proof. Consider the LP relaxation of problem TPC; it is easy to verify that an optimal 
solution is given by Xe = 1/2 for all supply edges e e E{Q). Now, the result follows 
directly from Theorem 7. □ 

Now consider the following problem: given a {2k — 1) -edge-connected graph T = 
{V, F) and a set of “supply” edges E with nonnegative weights w : £1—^3?+, the goal 
is to find a minimum weight subset E' of E such that G' = {V,F + E') is {2k)- 
edge-connected. Since the edge connectivity of T is odd, this problem is equivalent to 
problem CBRA because all the {2k — l)-cuts (minimum cuts) of T can be represented 
by means of a laminar family. (This follows easily from the fact that the node sets of 
two minimum cuts do not cross in this case.) 



5 NP-completeness Results 

First, we show that problem TPC (tree plus cycle) is NP-complete. It is convenient to 
reformulate TPC in terms of a laminar family rather than a tree. 

Laminar Family Plus Cycle Problem (LPC) : 

INSTANCE: A laminar family Ti on a node set V, a cycle Q = {V, E) on V, and a 
positive integer N. (Assume 0 , F ^ %■) 

QUESTION: Is there a I -cover E' offL such that E' C E and \E'\ < N7 

We give a polynomial-time reduction from the 3-dimensional matching problem to 
problem LPC. Our reduction is based on the proof of [FJ 81, Theorem 2]. 

Theorem 9. Problem LPC is NP-complete. 

Proof. It is easy to see that LPC is in NP. Given an instance of 3DM (that is, three 
disjoint sets W, X, F, of cardinality g each, and a set M of 3-edges (triples) {wiXjyk) € 
W X X X V), construct a connected graph T as follows. First build a star with a “root” 
r and 3g leaves {rui , . . . ,Wq,x\, . . . ,Xq,yi, . . . ,yq\ corresponding to the elements of 
W U X UY. Then for each 3-edge {wiXjyk) of M add two nodes Oijk and cnjk to 
T and add the edges WiUijk, Wiiiijk. Now replace each of the 2q nodes corresponding 
to elements of X and Y by complete graphs (or arbitrary 2-edge-connected graphs) 
denoted by Xi , . . . , AT, , Yi , . . . , T/ as follows. Each complete subgraph of this type has 
dM{xj)8q (dM{yk)8q) nodes and is partitioned into dM{xj) (dM{yk)) parts (so-called 
“lanes”) of size 8q each. (Here, dM{xj) and dM{yk) denote the number of 3-edges of 
M containing Xj, respectively, y^.) The graph constructed is connected and has 2p + 2q 
“leaves” (that is, leaf 2 -edge-connected components), where p := |M|. 

The next step is to define the cycle Q. The nodes of Q are the nodes of the leaves 
of T. Hence, |U(Q)| = p{16q + 2). First, we define p disjoint paths of Q such that 
each has 16g + 2 nodes (so each of these paths has length 16g + 1). Every 3-edge 
{wiXjyk) of M defines such a path as follows: take the 8q nodes (and edges connecting 
the consecutive ones) I 1 I 2 . . . Isq of a lane of Xi in an arbitrary order, then take the edges 
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hqO'ijk, O'ijk^ijk, ^ijkmsq for some node rngq of some lane of Y^, in this order, and then 
take the other nodes of this lane rngq-i, ■ ■ ■ , mi in an arbitrary order. The lanes are 
chosen in such a way that these paths are pairwise disjoint. This ean be done, since the 
lanes are pairwise disjoint and each Xj (or Y^) has djifixj) (or dMiVk)) lanes. 

Now fix a cyclic ordering ei, . . . , Cp of the 3-edges of M and complete the cycle 
Q by adding the missing p edges in such a way that the end of the path eorresponding 
to 6s = {wiXjyk) (that is, a node mi of a lane in Yk) is connected to the first node 
of the path corresponding to e^+i = Wi'Xj'yk' (that is, to a node of a lane of Xj>) 
for 1 < s < p. Note that each of these edges connects a complete subgraph Xj to a 
complete subgraph Yk and all the edges of Q either eonnect different leaves of T or 
conneet different nodes of some leaf of T. Furthermore, V {Q) equals the union of the 
nodes of the leaves of T. 

The last part of the reduction consists of defining a laminar family T-L on V{Q). 
We define % by defining two disjoint subfamilies Hi and % 2 - Let Hi ■= {S' n 
V{Q) : dr(S) = l,r ^ S, S C Y(T)} contain intersections of Y(Q) and those 
minimum euts of T which do not contain the root. It is easy to see that this family is la- 
minar. H 2 consists of 2p disjoint collections, each of them defined on the nodes of a lane 
of a complete subgraph of the form Xj or Yk of T as follows. Let us fix such a subgraph, 
say Xi. (The definition is similar for all the 2q subgraphs Xi , . . . , Xq,Yi, . . . ,Yq.) Fo- 
cus on a lane li, . . . ,lsq oi Xi, where the numbering follows the ordering of these nodes 
in Q. (Henee Isq is conneetedto some leaf aijk and h is conneetedto some Yfe.jThis lane 
adds the following sets to H 2 ■ the singletons li, ... ,lsq, the sets of nodes of the intervals 
of Q with end-node pairs {Isq-iJsq-s) (2 < s < 4g) and {Uq-r, h) (f < iq — 3). 
Each lane of every complete subgraph Xj, Yk (1 < j,k < q) adds a similar collection 
to H 2 . Clearly, every collection of this type is laminar, and the collections are defined 
on pairwise disjoint sets of nodes, where each of these sets is ineluded in a minimal 
element of Hi. Therefore His a laminar family on V (Q), where H := Hi U Note 
that eaeh node of Q belongs to as a singleton set. 

Observe the following important property, that follows from the structure of these 
colleetions and the fact that every node of Q belongs to H. Let E' C E{Q) be a 1-eover 
of H. Then 

(*) if the edge Isqhq-i (or similarly I 1 I 2 , msqmsq-i, mim 2 ) for some lane in 

an Xj or Yk is not in E' then \E'\ > \V {Q) \/2 + 2q— 1. 

It is easy to see that our reduetion is polynomial. We claim that there exists a solution 
to the given instance of 3DM (that is, a set of q pairwise disjoint 3-edges of M) if and 
only if H has a 1-eover of size at most p + 8pq + q = \V {Q)\/2 + q. 

First observe that a set is a 1 -eover if and only ifT + E' is 2-edge-eonnected 
and E' covers eaeh member of H 2 - Moreover, as it was verified in [FJ 81], there is a 
3-dimensional matching if and only if there is set E* of p + q edges in Q* for whieh 
T* + E* is 2-edge-eonnected, where T* and Q* arise from T and Q, respeetively, by 
contraeting the complete subgraphs (that is, the sets of the form Xj , Yk , which are 2-edge- 
conneeted) to singletons and deleting the edges eonnecting these complete subgraphs 
from the eyele. 

Suppose that there exists a 3-dimensional matching M' C M. Then there exists a set 
E* of size p + q which makes T* 2-edge-eonnected and it is easy to see that there exists 
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a set E" of independent edges in Q which covers T-L 2 - Hence \E"\ = lQqp/2 = 8pq. 
Now E' := E* U E" covers % and \E'\ = 8pq + p + g, as required. 

The proof of the other direction (which relies on (=i<)) is omitted. □ 

Corollary 10. The following problem is NP-hard: given a 2-eover C of a laminar family 
PL, find a minimum-size 1-cover that is contained in C. 

Theorem 11. The following problem is NP-hard: given a 2-packing P of a capacitated 
laminar family PL, u.find a maximum-size 1-packing that is contained in P. 

6 Conclusions 

We suspect that our bounds on the ratios for 1 -covers versus 2-covers and for 1 -packings 
versus 2-packings hold in general. 

1 -Cover Conjecture: Consider the integer program for a minimum weight 
1-cover of a laminar family and its LP relaxation (see Section 1). We conjecture 
that the ratio of the optimal values is at most 4/3. 

1 -Packing Conjecture: Consider the integer program for a maximum weight 
1-packing of a capacitated laminar family and its LP relaxation (see Section 1). 

We conjecture that the ratio of the optimal values is at least 2/3. 

Another interesting question is to find sufficient conditions on the laminar family 
PL (or, on the tree T{PL) representing PL) such that the LP relaxation has ^-integral 
extreme point solutions. As noted in Section 1, the LP relaxation has integral extreme 
point solutions iff T {PL) is a path. 
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Abstract. Let G be a finite group. Choose a set S of size k uniformly from G 
and eonsider a lazy random walk on the corresponding Cayley graph r{G,S). 
We show that for almost all choices of S given k = 2alog2 |G|, a > 1, we have 
J?e Ai < 1 — 1 /2a. A similar but weaker result was obtained earlier by Alon and 
Roichman (see [4]). 



1 Introduction 

In the past few years there has been a significant progress in analysis of random Cayley 
graphs. Still for general groups Gand small sets ofgenerators, such as of size 0(log |G|), 
more progress is yet to be made. Our results partially fill this gap. 

Here is a general setup of a problem. Let G be a finite group, n = |G|. For a given 
k choose uniformly k random elements gi,. .. ,gk € G. Denote by S the set of these 
elements. By r{G,S) denote the corresponding oriented Cayley graph. Define transition 
matrix A = (ug^h), g,h e G to be Ug^h = 1/A: if g^^h e S, and Og^h = 0 otherwise. By 
1 = |^o| > 1^1 1 > • • • denote the eigenvalues of the A. Note that since H is a real matrix, 
eigenvalues are either real, or complex numbers which appear in complex conjugate 
pairs. 

Theorem Let G be a finite group, n= |G|. Let s > 0, a > 1 be given. Then 
Pr(i?eAi >4/a)— ^0 asn^oo, 

where the probability is taken over all choices of S = {g\ ,...,gk} of size 

k > 2alog2n 



In other words, we get a constant expansion for k = f2(log |G|). In particular, this 
implies the 0(log |G|) mixing time for random random walks (see [1,8,12]). Our techni- 
que is based on careful analysis of such random walks and is obtained as an application 
of the Erdos-Renyi results on random subproducts (see [10]). 

Similar result for general groups was first explored by Alon and Roichman in [4], 
where authors considered symmetric Cayley graphs G(G, S), S = Clearly, transi- 
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tion matrix A has only real eigenvalues. The authors showed that when k = J7(logn) 
the seeond largest eigenvalue A 2 of the random is bounded by a eonstant. Formally, they 
showed that given l>(i> l/e,k> (1 + o(l))2e^ln2/ {5e—l) then E{\ 2 ) < 5. Alon 
and Roiehman’s analysis relies on the Wigner’s semicircle law and seem impossible to 
generalize to this case. Note also that our teehnique gives a bound for the case 6<l/e. 

Finally, we should mention that there is a number of results on the mixing time on 
random Cayley graphs, rather than on relaxation times we consider in this paper (see 
e.g. [8,12,13,14]). In [8,13] authors consider the case when k = J?(log“n), where a > 1. 
The analysis in [13] gives also bounds on eigenvalue gap in this case. 



2 Random Random Walks 

A lazy random walk W = }V(Q,S) is defined as a finite Markov chain Xt with state 
space G, and sueh that Xq = e, 



Xt+,=Xt-gl* 

where gi = gi{t) are independent and uniform in [k] = {1, . . . , fc}; are independent 
and uniform in {0,1}. By Q™ denote the probability distribution of Xm. If S' is a set of 
generators, then Q™'{g) — ^ 1/|G|, i-e. the walk >V has a uniform stationary distribution 
U, U{g) = l/niox aWgeG. 

Define the separation distance 

s(m) = |G|max 

It is easy to see that 0 < s(m) < 1. It is also known (see e.g. [2]) that s(m+ 1) < s(m) 
for all m > 0, and s{m + l) < s(m) •s(i). 

The general problem is to find the smallest m such that s (m) < e for almost all choices 
of S. Clearly, if m is small enough, then almost surely S is not a set of generators and 
s(m) = 1, d(m) >1/2. The example of G = Z 2 shows that if k <r = log 2 n this is the 
case. Thus it is reasonable to consider only the ease k > log 2 n. 

It is not hard to see (see [3,?]) that s(m) (i?eAi)™ as m — ^ 00 . Thus if we can 
prove an upper bound for the separation distance with high probability, this will give a 
desired eigenvalue bound. 



3 Random Subproducts 

Let G be a finite group, n = | G | . Throughout the paper we will ignore a small difference 
between random subsets S and random sequences J of group elements. The reason is 
that the two concepts are virtually identical since probability of repetition of elements 
(having gi = gj, 1 < i < j < k) when k = O(logn) is exponentially small. Thus in the 
future we will substitute uniform sets S of size k by the uniform sequences J e G^, 
which, of course, can have repeated elements. 
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Fix a sequence J = {gi,. .. ,gk) & . Random subproducts are defined as 



where e {0, 1} are given by independent unbiased coin flips. Denote by Pj the pro- 
bability distribution of the random subproducts on G. Erdos and Renyi showed in [10] 
that gi,. . . ,gk are chosen uniformly and independently, then : 



> 1 — d for /c > 2 log 2 n + 2 log 2 1/e + log 2 1 /5 

Proofs of the Theorem is based on (=i<). 

Let m > 21og2 |G|, and let J be as above. Denote by Qj the probability distri- 
bution Qj" of the lazy random walk W{Q,S) after m steps, where S = S{J) is a 
set of elements in J . Suppose we can show that with probability > 1 — a/2 we have 
= nmaxggG(l/n — Qj'(g)) < cr/2, where a — ^ 0 as n — ^ oo. This would imply 
the theorem. Indeed, we have 

i?[sj(m)] < Pr(sj < a/2) •o;/2 + Pr(sj > a/2) • 1 

^ (1 — OL /2 T OL /2 <c OL — y 0 

By definition, Qj is distributed as random subproducts 



where ii,. . . are uniform and independent in [k] = {1,. . . ,fc}. 

Let J = (^ 1 ,. . . be fixed. For a given I = C [A:]™, consider J(J) = 

{gi^,..., gi^ ) and Ri = Pj{i) ■ By definition of a lazy random walk we have 

We will show that for almost all choices of J and I, the probability distribution Rj is 
almost uniform. 

Let/ = € [/c] ™ be a sequence. Define an L-5wb5e^Me«ce /'= (f , . . . , ir; ) 

to satisfy 1 < ri < • • • < r; < m, and for all j, I < j < m, there exist a unique t, 
l<t<m, such that rt < j and — ij - In other words, we read numbers in I, and whe- 
never we find a new number, we add it to I'. For example, if J = ( 2 , 7 , 5 , 1 , 2 , 3 , 2 , 5 , 6 ), 
then I' = ( 2 , 7 , 5 , 1 , 3 , 6 ) is an L-subsequence of length 6 . Note that by definition L- 
subsequence is always unique. 

Lemma 1. Let I, J be as above, n = |G|. Let L' be a h-subsequence of L. Then 
for all a,/? > 0 we have maxggc \Ri'{g) ~^/n\ <ajn with probability 1 — j3 implies 
maXggG \Riig) “ 1/^1 <Ci/n with probability 1 — j3. 

Lemma 2. Let /? > 0, a > 1 , k = al. Consider the probability P{1) a sequence 
I <E [k]^ does not contain an C-subsequence L' of length 1. Then P{1) 1 / f as I ^ oo. 



(=i<) Pr ( max 
\aeG 



PA,)-- 



< - 
n 
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4 Proof of the Theorem 

First we deduce the Theorem from the lemmas and then prove the lemmas. 

Proof of Theorem 1. Let J'heaL-subsequenceof/oflengthi > 21og2n + 31og2 1/<^. 
Since numbers in I' are all different, for at least (1 — 6) fraction of all J = {gi,. ..,Qk}, 
we have 

max Rr'(q) 

g€G ^ ’ n 

Indeed, this is a restatement of (*) with e = 6. 

Note here that we do not require the actual group elements gi^ , ij € I' be different. By 
coincidence they can be the same. But we do require that numbers in I' are all different, 
so that the corresponding group elements are independent. 

Let I = [21og2n + 31og2 k> al, and m > {l+e)al In Denote by P{1) 
the probability that a uniformly chosen J € [/c]™ does not contain an L-subsequence of 
length 1. By Lemma 1, with probability > (1 — P(/))(l — S) we have 

max I Riig) 

g€G \n 

where the the probability is taken over all / G [/c]™andall J G Setting d= d(Q;,e,n) 
small enough we immediately obtain sj(m) < a/2 with probability > (1 — a/2). where 
the the probability is taken over all J G G^. By observations above, this is exactly what 
we need to prove the theorem. 

Now take 6 = a/ A, (3 = e/2, a = 4/a/ By Lemma 2, and and since I > log 2 nwehave 
P(J) a/4 for n large enough. We conclude (1 — P(/))(l — d) > (1 — a/4)^ > 1 — a/2. 
This finishes proof of Theorem 1. □ 





5 Proof of Lemmas 



Proof of Lemma 1. For any x,y £ G denote by y® the element xyx^^ G G. Clearly, if 
y is uniform in G and independent of x, then y® is also uniform in G. 

Let Q be a distribution on a group G which depends on J G G™" and takes va- 
lues in G. We call Q {a,l3)-good if with probability > (1 — /?) it satisfies inequality 

maxggG \Q(.g) - l/n| < a/n. 

Consider the following random subproducts: 



h = g\^ 



gr i/r+1 



where x is fixed, while gi,...,gi are uniform and independent in G, and ei , . . . , e; are 
uniform and independent in {0,1}. We have 



h = g{^ gf-(g^+ir+^ {grr‘-x 

Thus h-x^^ is distributed as Rj, I = {1,2,. . . ,1). Therefore if P/ is (a,/?) -good, then 
distribution of h is also (o;,/3)-good. 
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Similarly, let x,y,... be fixed group elements. Then random subproducts 
h = gl^ x-gp- y-gp ■■■■ 

are distributed as Rj ■ f{x,y, . . .), I = (1,. . . ,r, . . . ,Z,. . . Indeed, pull the rightmost 
fixed element all the way to the right, then pull the previous one, etc. We conclude that if 
Ri is (a, /3)-good, then distribution of is also (a, /?)-good. Note that in the observation 
above we can relax a condition that the elements x,y,. .. are fixed. Since we do not have 
to change their relative order, it is enough to require that they are independent of the 
elements gi to the right of them. 

Now let J = {ii,. . . ,irn) € [k]™", and let I' be an L-subsequence of I. Define Q{h) 
to be a distribution of random subproducts 



where all the powers €j are fixed except for those of j € We claim that if Rji is 
(o;,/?)-good, then Q{h) is also (o;,/3)-good. Indeed, pull all the elements that are not 
in I' to the right. By definition of the L-subsequence, the elements in I' to the right of 
those that are not in I' must be different and thus independent of each other. Thus by 
the observation above Q{h) is also (o;,/?)-good. 

Now, the distribution Ri is defined as an average of the distributions Q{h) over all 
of the 2™^* choices of values eg of elements not in I' = , . . . , ) . Observe that for 

fixed gi,...,gk and different choices the rj the distributions of subproducts h can 

be obtained by a shift from each other (i.e. by multiplication on a fixed group element). 
Therefore each of these distributions has the same separation distance. In other words, 
each of the J is either “good” altogether or “bad” altogether for all 2™^* choices. 
Therefore after averaging we obtain an (o;,/?)-good distribution Rj. This finishes proof 
of the lemma. □ 

Proof of Lemma 2. The problem is equivalent to the following question. What is the 
asymptotic behavior of the probability that in the usual coupon collector’s problem with 
k coupons, after m trials we have at least I different coupons? Indeed, observe that if all 
m chosen coupons correspond to elements in a sequence / e [k]^, then distinct coupons 
correspond to L-subsequence I' of length 1. Note that in our case k = al and m — ^ oo. 

Let T be the first time we collect I out of k possible coupons. Let us compute the 
expected time E{t). We have 

When k = al. We obtain 



E{t) = al l^log + o(l)^ 

Now let m= {1 + /3) E{t). The probability 1 — P{1) that after m trials we collect I 
coupons is equal to Pr (r < m) . Now use a geometric bound on the distribution to obtain 
the result. We skip the easy details. □ 
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Abstract. In this paper we study the problem of recognizing and representing 
dynamically changing proper interval graphs. The input to the problem consists 
of a series of modifications to be performed on a graph, where a modification can 
be a deletion or an addition of a vertex or an edge. The objective is to maintain 
a representation of the graph as long as it remains a proper interval graph, and to 
detect when it ceases to be so. The representation should enable one to efficiently 
construct a realization of the graph by an inclusion-free family of intervals. This 
problem has important applications in physical mapping of DNA. 

We give a near-optimal fully dynamic algorithm for this problem. It operates 
in time O(logn) per edge insertion or deletion. We prove a close lower bound 
of I?(logn/(loglogn + logb)) amortized time per operation in the cell probe 
model with word-size b. We also construct optimal incremental and decremental 
algorithms for the problem, which handle each edge operation in 0(1) time. 



1 Introduction 

A graph G is called an interval graph if its vertices can be assigned to intervals on the 
real line so that two vertices are adjacent in G iff their intervals intersect. The set of 
intervals assigned to the vertices of G is called a realization of G. If the set of intervals 
can be chosen to be inclusion-free, then G is called a proper interval graph. Proper 
interval graphs have been studied extensively in the literature (cf. [7,13]), and several 
linear time algorithms are known for their recognition and realization [2,3]. 

This paper deals with the problem of recognizing and representing dynamically 
changing proper interval graphs. The input is a series of operations to be performed on a 
graph, where an operation is any of the following: Adding a vertex (along with the edges 
incident to it), deleting a vertex (and the edges incident to it), adding an edge and deleting 
an edge. The objective is to maintain a representation of the dynamic graph as long as it 
is a proper interval graph, and to detect when it ceases to be so . The representation should 
enable one to efficiently construct a realization of the graph. In the incremental version 
of the problem, only addition operations are permitted, i.e., the operations include only 
the addition of a vertex and the addition of an edge. In the decremental version of the 
problem only deletion operations are allowed. 
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The motivation for this problem comes from its application to physical mapping of 
DNA [1], Physical mapping is the process of reconstructing the relative position of DNA 
fragments, called clones, along the target DNA molecule, prior to their sequencing, based 
on information about their pairwise overlaps. In some biological frameworks the set of 
clones is virtually inclusion-free - for example when all clones have a similar length (this 
is the case for instance for cosmid clones). In this case, the physical mapping problem 
can be modeled using proper interval graphs as follows. A graph G is built according to 
the biological data. Each clone is represented by a vertex and two vertices are adjacent 
iff their corresponding clones overlap. The physical mapping problem then translates to 
the problem of finding a realization of G, or determining that none exists. 

Had the overlap information been accurate, the two problems would have been 
equivalent. However, some biological techniques may occasionally lead to an incorrect 
conclusion about whether two clones intersect, and additional experiments may change 
the status of an intersection between two clones. The resulting changes to the corre- 
sponding graph are the deletion of an edge, or the addition of an edge. The set of clones 
is also subject to changes, such as adding new clones or deleting ’bad’ clones (such as 
chimerics [14]). These translate into addition or deletion of vertices in the corresponding 
graph. Therefore, we would like to be able to dynamically change our graph, so as to 
reflect the changes in the biological data, as long as they allow us to construct a map, 
i.e., as long as the graph remains a proper interval graph. 

Several authors have studied the problem of dynamically recognizing and represen- 
ting certain graph families. Hsu [ 1 0] has given an O (m + n log n) -time incremental algo- 
rithm for recognizing interval graphs. (Throughout, we denote the number of vertices in 
the graph by n and the number of edges in it by m.) Deng, Hell and Huang [3] have given 
a linear-time incremental algorithm for recognizing and representing connected proper 
interval graphs This algorithm requires that the graph will remain connected throug- 
hout the modifications. In both algorithms [10,3] only vertex increments are handled. 
Recently, Ibarra [11] found a fully dynamic algorithm for recognizing chordal graphs, 
which handles each edge operation in 0(n) time, or alternatively, an edge deletion in 
0{n log n) time and an edge insertion in 0(n/ log n) time. 

Our results are as follows: For the general problem of recognizing and representing 
proper interval graphs we give a fully dynamic algorithm which handles each operation 
in time 0{d + log n), where d denotes the number of edges involved in the operation. 
Thus, in case a vertex is added or deleted, d equals its degree, and in case an edge is added 
or deleted, d = 1 . Our algorithm builds on the representation of proper interval graphs 
given in [3 ]. We also prove a lower bound for this problem of J? (log n / (log log n + log b ) ) 
amortized time per edge operation in the cell probe model of computation with word- size 
b [16]. It follows that our algorithm is nearly optimal (up to a factor of 0(log log n)). 

For the incremental and the decremental versions of the problem we give optimal 
algorithms (up to a constant factor) which handle each operation in time 0{d). For the 
incremental problem this generalizes the result of [3] to arbitrary instances. 

As a part of our general algorithm we give a fully dynamic procedure for maintaining 
connectivity in proper interval graphs. The procedure receives as input a sequence of 
operations each of which is a vertex addition or deletion, an edge addition or deletion, 
or a query whether two vertices are in the same connected component. It is assumed 




A Fully Dynamic Algorithm for Recognizing and Representing Proper Interval Graphs 529 



that the graph remains proper interval throughout the modifications, since otherwise 
our main algorithm detects that the graph is no longer a proper interval graph and 
halts. We show how to implement this procedure in O(logn) time per operation. In 
comparison, the best known algorithms for maintaining connectivity in general graphs 
require 0(log^ n) amortized time per operation [9], or O(y^) worst-case (deterministic) 
time per operation [4] . We also show that the lower bound of Fredman and Henzinger [5] 
of J7(logn/(loglogn + log&)) amortized time per operation (in the cell probe model 
with word-size 6) for maintaining connectivity in general graphs, applies to the problem 
of maintaining connectivity in proper interval graphs. 

The paper is organized as follows: In section 2 we give the basic background and 
describe our representation of proper interval graphs and the realization it defines. In 
sections 3 and 4 we present the incremental algorithm. In section 5 we extend the 
incremental algorithm to a fully dynamic algorithm for proper interval graph recognition 
and representation. We also derive an optimal decremental algorithm. In section 6 we 
give a fully dynamic algorithm for maintaining connectivity in proper interval graphs. 
Finally, in section 7 we prove a lower bound on the amortized time per operation of a 
fully dynamic algorithm for recognizing proper interval graphs. For lack of space, some 
of the proofs and some of the algorithmic details are omitted. 

2 Preliminaries 

Let G = {V, E) be a graph. We denote its set V of vertices also by V (G) and its set E 
of edges also by E{G). For a vertex v <E V we define N{v) := {u <^V ■. (u, v) <E E] 
and N[v] := N{v) U {u}. Let R be an equivalence relation on V defined by uRv iff 
[tt] = [u] . Each equivalence class of R is called a block of G. Note that every block 

of G is a complete subgraph of G. The size of a block is the number of vertices in it. 
Two blocks A and B are neighbors in G if some (and hence all) vertices a £ A,b £ B, 
are adjacent in G. A straight enumeration of G is a linear ordering <P of the blocks in G, 
such that for every block, the block and its neighboring blocks are consecutive in <P. 

Let <1> = B\ < . . . < S; be an ordering of the blocks of G. For any 
we say that Bi is ordered to the left of Bj, and that Bj is ordered to the right of Bi. A 
chordless cycle is an induced cycle of length greater than 3. A elaw is an induced 
A graph is elaw-free if it does not contain an induced claw. For basic definitions in graph 
theory see, e.g., [7]. 

The following are some useful facts about interval and proper interval graphs. 
Theorem 1. ([12]) An interval graph contains no chordless cycle. 

Theorem 2. ([15]) A graph is a proper interval graph iff it is interval and claw-free. 

Theorem 3. ([3]) A graph is a proper interval graph iff it has a straight enumeration. 

Lemma 4 (“The umbrella property”). Let <P be a straight enumeration of a connected 
proper interval graph G. If A, B and G are blocks of G, such that A < B < C in <P and 
A is adjacent to G, then B is adjacent to A and to G (see figure 1). 
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Fig. 1. The umbrella property 



Let G be a eonnected proper interval graph and let # be a straight enumeration of G. 
It is shown in [3] that a eonnected proper interval graph has a unique straight enumeration 
up to its full reversal. Define the out-degree of a block B w.r.t. 4>, denoted by o{B), as 
the number of neighbors of B which are ordered to its right in d>. 

We shall use the following representation: For each connected component of the 
dynamic graph we maintain a straight enumeration (in fact, for technical reasons we 
shall maintain both the enumeration and its reversal). The details of the data structure 
containing this information will be described below. 

This information implicitly defines a realization of the dynamic graph (cf. [3]) as 
follows: Assign to each vertex in block Bi the interval [i, i + o{Bi) + 1 — i]. The out- 
degrees and hence the realization of the graph can be computed from our data structure 
in time 0{n). 



3 An Incremental Algorithm for Vertex Addition 

In the following two sections we describe an optimal incremental algorithm for recogni- 
zing and representing proper interval graphs. The algorithm receives as input a series 
of addition operations to be performed on a graph. Upon each operation the algorithm 
updates its representation of the graph and halts if the current graph is no longer a proper 
interval graph. The algorithm handles each operation in time 0{d), where d denotes the 
number of edges involved in the operation. It is assumed that initially the graph is empty, 
or alternatively, that the representation of the initial graph is known. 

A eontig of a connected proper interval graph G is a straight enumeration of G. The 
first and the last blocks of a eontig are called end-blocks. The rest of the blocks are called 
inner-blocks. 

As mentioned above, each component of the dynamic graph has exactly two contigs 
(which are full reversals of each other) and both are maintained by the algorithm. Each 
operation involves updating the representation. (In the sequel we concentrate on descri- 
bing only one of the two contigs for each component. The second eontig is updated in a 
similar way.) 

3.1 The Data Structure 

The following data is kept and updated by the algorithm: 
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1 . For each vertex we keep the name of the block to which it belongs. 

2. For each block we keep the following: 

a) An end pointer which is null if the block is not an end-block of its contig, and 
otherwise points to the other end-block of that contig. 

b) The size of the block. 

c) Left and right near pointers, pointing to nearest neighbor blocks on the left and 
on the right respectively. 

d) Left and right far pointers, pointing to farthest neighbor blocks on the left and 
on the right respectively. 

e) Left and right ^e^pointers, pointing to the block. 

f) A counter. 

In the following we shall omit details about the obvious updates to the name of the 
block of a vertex and to the size of a block. 

During the execution of the algorithm we may need to update many far pointers 
pointing to a certain block, so that they point to another block. In order to be able to 
do that in 0(1) time we use the technique of nested pointers'. We make the far pointers 
point to a location whose content is the address of the block to which the far pointers 
should point. The role of this special location will be served by our self-pointers. The 
value of the left and right self-pointers of B is always the address of B. When we say 
that a certain left (right) far pointer points to B, we mean that it points to a left (right) 
self-pointer of B. Let A and B be blocks. In order to change all left (right) far pointers 
pointing to A so that they point to B, we require that no left (right) far pointer points to 
B. If this is the case, we simply exchange the left (right) self-pointer of A with the left 
(right) self-pointer of B. This means that: (1) The previous left (right) self-pointer of A 
is made to point to B, and the algorithm records it as the new left (right) self-pointer of 
B', (2) The previous left (right) self-pointer of B is made to point to A, and the algorithm 
records it as the new left (right) self-pointer of A. 

We shall use the following notation: For a block B we denote its address in the 
memory by k,B. When we set a far pointer to point to a left or to a right self-pointer of 
B we will abbreviate and set it to &i?. We denote the left and right near pointers of B by 
Ni{B) and Nj.{B) respectively. We denote the left and right far pointers of B by FfB) 
and Fr{B) respectively. We denote its end pointer by E{B).ln the sequel we often refer 
to blocks by their addresses. For example, if A and B are blocks, and Nr (A) = &iB, we 
sometimes refer to B by Nr{A). When it is clear from the context, we also use a name 
of a block to denote any vertex in that block. Given a contig <P we denote its reversal by 

In general when performing an operation, we denote the graph before the operation 
is carried out by G, and the graph after the operation is carried out by G' . 

3.2 The Impact of a New Vertex 

In the following we describe the changes made to the representation of the graph in case 
G' is formed from G by the addition of a new vertex v of degree d. We also give some 
necessary and some sufficient conditions for deciding whether G' is proper interval. 

Let i? be a block of G. We say that v is adjacent to i? if u is adjacent to some vertex 
in B. We say that v is fully adjacent to i? if u is adjacent to every vertex in B. We say 
that V is partially adjacent to i? if u is adjacent to B but not fully adjacent to B. 
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The following lemmas characterize, assuming that G' is proper interval, the adja- 
cencies of the new vertex. 

Lemma 5. If G' is a proper interval graph then v can have neighbors in at most two 
connected components of G. 

Lemma 6. [3] Let G be a connected component of G containing neighbors of v. Let 
B\ < ... < Bk be a contig of G. Assume that G' is proper interval and let 1 < a < 
b < c < k. Then the following properties are satisfied: 

1. Ifv is adjacent to Ba and to Be, then v is fully adjacent to Bb. 

2. Ifv is adjacent to B^ and not fully adjacent to Ba and to B^, then Ba is not adjacent 
to Be. 

3. Ifb = a + l,c = 6+ l and v is adjacent to Bb, then v is fully adjacent to Ba or to 
Be- 

One can view a contig of a connected proper interval graph C as a weak linear 
order <<j on the vertices of G, where x <# y iff the block containing x is ordered in 
<P to the left of the block containing y. We say that <P' is a refinement of <P if for every 
x,y £ V (G), X <<p y implies x <<p> y (since a contig can be reversed, we also allow 
complete reversal of L>). 

Lemma 7. If G is a connected induced subgraph of a proper interval graph G' , <P is a 
contig of G and <P' is a straight enumeration of G', then <P' is a refinement of <P. 

Note, that whenever v is partially adjacent to a block B in G, then the addition of v 
will cause B to split into two blocks of G', namely B \ N{v) and Br\N{v). Otherwise, 
if i? is a block of G to which v is either fully adjacent or not adjacent, then B is also a 
block of G". 

Corollary 8. IfB is a block of G to which v is partially adjacent, then B \ N{v) and 
B n N{v) occur consecutively in a straight enumeration of G'. 

Lemma 9. Let G be a connected component of G containing neighbors of v. Let the set 
of blocks in G which are adjacent to v be {Bi , ... , Bk }. Assume that in a contig of G, 
Bi < ... < Bk. IfG' is proper interval then the following properties are satisfied: 

1. Bi,. . . ,Bk are consecutive in G. 

2. Ifk > 3 then v is fully adjacent to B 2 , . . . , Bk-\. 

3. Ifv is adjacent to a single block B\ in G, then B\ is an end-block. 

4. Ifv is adjacent to more than one block in G and has neighbors in another component, 
then B\ is adjacent to Bk, and one of B\ or Bk is an end-block to which v is fully 
adjacent, while the other is an inner-block. 

Proof. Claims 1 and 2 follow directly from part 1 of Lemma 6. Claim 3 follows from 
part 3 of Lemma 6. To prove the last part of the lemma let us denote the other component 
containing neighbors of v by D. Examine the induced connected subgraph H of G 
whose set of vertices is V{H) = {u} U V(G) U V{D). H is proper interval as an 
induced subgraph of G. It is composed of three types of blocks: Blocks whose vertices 
are from V (C), which we will call henceforth G-blocks; blocks whose vertices are from 
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V {D), which we will call henceforth D-blocks; and {n} which is a block of H since 
H \ {n} is not connected. All blocks of C remain intact in H, except Bi and Bk which 
might split into Bj \ N{v) and Bj n N{v), for j = 1, k. 

Surely in a contig of H, C-blocks must be ordered completely before or completely 
after D-blocks. Let <P denote a contig of H, in which C-blocks are ordered before D- 
blocks. Let X denote the rightmost C-block in By the umbrella property, X < {u} 
and moreover, X is adjacent to v. By Lemma 7, is a refinement of a contig of C. 
Hence, X C Bi or X C Bk (more precisely, X — Bif] N{v) or X = Bk C\ N{v)). 
Therefore, one of B\ or Bk is an end-block. 

W.l.o.g. X C Bk- Suppose to the contrary that v is not fully adjacent to Bk- Then 
by Lemma 7 we have Bk-i n N{v) < Bk \ N{v) < {u} in 4>, contradicting the 
umbrella property. B\ must be adjacent to Bk, or else G' contains a claw consisting of 
V, B \ , Bk and a vertex from H(I?) n X(u) . It remains to show that B\ is an inner-block. 
Suppose it is an end block. Since Bi and Bk are adjacent, C contains a single block Bi, 
a contradiction. Thus, claim 4 is proved.* 

3.3 The Algorithm 

In our algorithm we rely on the ineremental algorithm of Deng, Hell and Huang [3], which 
we call henceforth the DHH algorithm. This algorithm handles the insertion of a new 
vertex into a graph in 0{d) time, provided that all its neighbors are in the same connected 
component, changing the straight enumeration of this component appropriately. We refer 
the reader to [3] for more details. 

We perform the following upon a request for adding a new vertex u. For each neighbor 
u of u we add one to the count of the bloek eontaining u. We call a block full if its counter 
equals its size, empty if its counter equals zero, and partial otherwise. In order to find a 
set of consecutive blocks which contain neighbors of v, we pick arbitrarily a neighbor 
of V and march down the enumeration of blocks to the left using the left near neighbor 
pointers. We eontinue till we hit an empty block or till we reach the end of the contig. 
We do the same to the right and this way we discover a maximal sequence of nonempty 
blocks in that component which contain neighbors of v. We call this maximal sequence 
a segment. Only the two extreme blocks of the segment are allowed to be partial or else 
we fail (by Lemma 9(2)). 

If the segment we found contains all neighbors of v then we can use the DHH 
algorithm in order to insert v into G, updating our internal data structure accordingly. 
Otherwise, by Lemmas 5 and 9(1) there could be only one more segment which contains 
neighbors of v. In that case, exactly one extreme block in each segment is an end-block 
to which V is fully adjacent (if the segment contains more than one block), and the two 
extreme blocks in each segment are adjacent, or else we fail (by Lemma 9(3,4)). 

We proceed as above to find a second segment containing neighbors of v. We can 
make sure that the two segments are from two different contigs by checking that their 
end-blocks do not point to each other. We also check that conditions 3 and 4 in Lemma 9 
are satisfied. If the two segments do not cover all neighbors of v, we fail. 

If V is adjacent to vertices in two distinct components G and D, then we should 
merge their contigs. Let <P = B\ < ... < Bk , be the two contigs of G. Let 
= B[ < . . . < B'l , be the two contigs of D. The way the merge is performed 
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depends on the blocks to which v is adjacent. If v is adjacent to and to B[, then 
by the umbrella property the two new contigs (up to refinements described below) are 
^ < {n} < if'and!?'^ < {n} < In the following we describe the necessary changes 
to our data structure in case these are the new contigs. The three other cases are handled 
similarly. 

- Block enumeration: We merge the two enumerations of blocks and put a new block 
{u} in-between the two contigs. Let the leftmost block adjacent to v in the new 
ordering <P < {u} < 4^ he Bi and let the rightmost block adjacent to v be Bj. If 
Bi is partial we split it into two blocks Bi = Bi \ N{v) and Bi = Bi D N{v) 
in this order. If Bj is partial we split it into two blocks Bj = Bj D N (v) and 
B'j = B'j \ N{v) in this order. 

- End pointers: We set E{Bi) = E{B[) and E{B'i) = E{Bk). We then nullify the 
end pointers of Bk and B[ . 

- Near pointers: We update Ni{{v}) = SzBk, Nr{{v}) = hB[,Nr{Bk) = &{u} 

and Ni{B[) = &{u}. Let Bq = 0. In case Bi was split we update Nr{Bi) = 
&zBi, Ni{Bi) = SzBi, Ni{Bi) = and Nj.{Bi_i) = SzBi. Similar updates 

are made in case Bj was split to the near pointers of Bj, Bj and -Sj+i- 

- Far pointers: If was split we set = Ei{Bi),Fr{Bi) = and exchange 

the left self-pointer of Bi with the left self-pointer of Bi. If i?' was split we 
set Er{Bj) = Er{Bj),Fi{Bj) = and exchange the right self-pointer of 

Bj with the right self-pointer of Bj. In addition, we set all right far pointers of 
Bi, . . . ,Bk and all left far pointers of B [, . . . , B'j_^,B'j to &{u} (in 0{d) 
time). Finally, we set Fi({u}) = ^Bi and FV({u}) = 



4 An Incremental Algorithm for Edge Addition 

In this section we show how to handle the addition of a new edge {u, v) in 0(1) time. 
We characterize the cases for which G' = G U { (u, u) } is proper interval and show how 
to efficiently detect them, and how to update our representation of the graph. 

Lemma 10. Ifu and v are in distinct components in G, then G' is proper interval ijf u 
and V were in end-blocks of their respective contigs. 

Proof. To prove the ’only if’ part let us examine the graph H = G'\{u} = G\{u}. H is 
proper interval as an induced subgraph of G. If G' is proper interval, then by Lemma 9(3) 
V must be in an end-block of its contig, since u is not adjacent to any other vertex in the 
component containing v. The same argument applies to u. 

To prove the ’if’ part we give a straight enumeration of the new connected component 
containing u and v in G' . Denote by G and D the components containing u and v 
respectively. Let Bi < ... < be a contig of G, such that u B^. Let B[ < . . . < B[ 
be a contig of D, such that v £ B[. Then Bi < ... < Bk \ {«} < {u} < {u} < 
\ {u} < ... < S; is a straight enumeration of the new component.* 

We can check in 0(1) time if u and v are in end-blocks of distinct contigs. If this is 
the case, we update our data structure according to the straight enumeration given in the 
proof of Lemma 10 in 0(1) time. 
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It remains to handle the ease where u and v were in the same conneeted component 
C in G. If N{u) = N{v) then by the umbrella property it follows that C contains only 
three blocks which are merged into a single block in G' . In this case G' is proper interval 
and updates to the internal data structure are trivial. The following lemma analyses the 
case where N{u) ^ N{v). 

Lemma 11. Let B\ < ... < Bk be a contig of G, such that u £ Bi and v € Bj 
for some 1 < i < j < k. Assume that N{u) f N{v). Then G' is proper interval iff 
Fr{Bi) = Bj^i and FfBj) = in G. 

Proof. To prove the ’only if’ part assume that G' is proper interval. Since Bi and Bj 
are not adjacent, Fj.{Bi) < Bj_i and FfBj) > Si+i. Suppose to the contrary that 
Fr{Bi) < Bj- 1 . Let 2 e Bj^i. If in addition FfBj) ~ then N[v] D N[z] 
(this is a strict containment). As v and z are in distinct blocks, there exists a vertex 
b e N[v] \ ^"[ 2 ;]. But then, v,b,z,u induce a claw in G', a contradiction. Hence, 
FfBj) > Bij-i and so < Bj. Let x e and let y e TV(-Bi+i)- Since u 

and X are in distinct blocks, either (u, y) E{G) or there is a vertex a € A'[u] \ A'[a;] 
(or both). In the first case, v, u, x, y and the vertices of the shortest path from y to v 
induce a chordless cycle in G'. In the second case u, a, x, v induce a claw in G'. Hence, 
in both cases we arrive at a contradiction. The proof that FfBj) = Hi+i is symmetric. 

To prove the ’if’ part we shall provide a straight enumeration of C U {u, u}. If 
Bi = {u}, Fr{Bj_f = Fr{Bj) and FfBj_f = Bi (i.e., N[v] = N[Bj_f in G'), 
we move v from Bj to Bj^i. Similarly, if Bj contained only v, FfBij-i) = FfBi) 
and Fr{Bi+i) = Bj (i.e., = N[Bij-i] in G'), we move u from Bi to Bi^\. If u 

was not moved and Bi D {u}, we split Bi into Bi \ {u}, {«} in this order. If v was not 
moved and Bj D {u}, we split Bj into {u}, Bj \ {u} in this order. It is easy to see that 
the result is a straight enumeration of C U {u, u}.« 

We can check in 0(1) time if the condition in Lemma 1 1 holds. If this is the case, 
we change our data structure so as to reflect the new straight enumeration given in the 
proof of Lemma 11. This can be done in 0(1) time, in a similar fashion to the update 
technique described in Section 3.3. The details are omitted here. The following theorem 
summarizes the results of Sections 3 and 4. 

Theorem 12. The incremental proper interval graph representation problem is solvable 
in 0(1) time per added edge. 

5 The Fully Dynamic Algorithm 

In this section we give a fully dynamic algorithm for recognizing and representing 
proper interval graphs. The algorithm performs each operation in 0{d + logn) time, 
where d denotes the number of edges involved in the operation. It supports four types 
of operations: Adding a vertex, adding an edge, deleting a vertex and deleting an edge. 
It is based on the same ideas used in the incremental algorithm. The main difficulty in 
extending the incremental algorithm to handle all types of operations, is updating the end 
pointers of blocks when deletions are allowed. To bypass this problem we do not keep 
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end pointers at all. Instead, we maintain the conneeted eomponents of G, and use this 
information in our algorithm. In the next section we show how to maintain the connected 
components of G in 0(log n) time per operation. We describe below how each operation 
is handled by the algorithm. 

5.1 The Addition of a Vertex or an Edge 

These operations are handled in essentially the same way as done by the incremental 
algorithm. However, in order to check if the end-blocks of two segments are in distinct 
components, we query our data structure of connected components (in 0(log n) time). 
Similarly, in order to check if the endpoints of an added edge are in distinct components, 
we check if their corresponding blocks are in distinct components (in 0(log n) time). 

5.2 The Deletion of a Vertex 

We show next how to update the contigs of G after deleting a vertex v of degree d. Note 
that G' is proper interval as an induced subgraph of G. Denote by X the block containing 
v.lf X D {u}, then the only change needed is to delete v. We hence concentrate on the 
case that X = {u}. We can find in 0{d) time the segment of blocks which includes X 
and all its neighbors. Let the contig containing X be i?i < . . . < Bk and let the blocks 
of the segment he Bi < ... < Bj, where X = Bi for some Let 

i?o = 0, = 0- We make the following updates: 

- Block enumeration: If 1 < i < ^, we check whether Bi can be merged with 

If Fi{Bi) = Fi{Bi_i),Fr{Bi) = Bi and Fr{Bi_i) = Bi_i, we merge them by 
moving all vertices from Bi to i?i-i (in 0{d) time) and deleting Bi.lfl < j < k we 
act similarly w.r.t. Bj and Hj+i. Finally, we delete Bi. lfl<l<k and 
are non-adjacent, then by the umbrella property they are no longer in the same 
connected component, and the contig should be split into two contigs, one ending 
at Bi^i and one beginning at 

- Near pointers: If Bi and were merged, we update Nr{Bi_i) = and 

Ni(Bij-i) = Similar updates should be made w.r.t. and i?j+i in case 

Bj and i?j+i were merged. If the contig is split, we nullify Nr {Bi^i ) and Ni ) . 

Otherwise, we update Nr{Bi^i) = and Ni{Bij-\) = k.Bi^\. 

- Far pointers: If Bi and i?i_i were merged, we exchange the right self-pointer of Bi 

with the right self-pointer of Bi^\. Similar changes should be made w.r.t. Bj and 
Bj+i. We also set all right far pointers previously pointing to Bi, to k,Bi^\, and all 
left far pointers previously pointing to B[, to (in 0{d) time). 

Note that these updates take 0{d) time and require no knowledge about the connected 
components of G. 

5.3 The Deletion of an Edge 

Let (m, v) be an edge of G to be deleted. Let G denote the connected component of G 
containing u and v, and let i?i < . . . < be a contig of C. If A: = 1 then Bi is split 
into {u}, Bi \ {u, u} and {u}, resulting in a straight enumeration of G' . Updates are 
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trivial in this case. If Al'[w] = N[v] then one can show that G' = G \ {(w, w)} is a proper 
interval graph iff G was a clique, so again k = 1. We assume henceforth that A: > 1 and 

N{u) ^ N{v). 

W.l.o.g. i < j. If Bi = {tt}, Bj = {v},j = i + 1 and Bi,Bj were far neighbors of 
each other, then we should split the contig into two contigs, one ending at Bi and the 
other beginning at Bj . Otherwise, updates to the straight enumeration are derived from 
the following lemma. 

Lemma 13. Let B\ < ... < Bk be a contig of G, such that u £ Bi and v € Bj 
for some 1 < i < j < k. Assume that N{u) f N{v). Then G' is proper interval iff 
Fr{Bi) = Bj and FfBj) = Bi in G. 

Proof. Assume that G' is proper interval. We will show that Fj.{Bi) = Bj. The proof 
that FfBj) = Bi is symmetric. Since Bi and Bj are adjacent in G, Fr{Bi) > Bj- 
Suppose to the contrary that Fr {Bi ) > Bj. Let x € Fr{Bi). Since x and v are in distinct 
blocks, either there is a vertex a € ^"[ 11 ] \ A'[a;] or there is a vertex be ^[ 2 ;] \ N[v] (or 
both). In the first case, by the umbrella property (a, u) € E{G) and therefore u, x, v, a 
induce a chordless cycle in G'. In the second case, x, b, u, v induce a claw in G'. Hence, 
in both cases we arrive at a contradiction. 

To prove the opposite direction we give a straight enumeration of C \ {(m, u)}. If 
Bj = {u}, FfBi^f - Fi{Bf and Fr{Bi^f = Bj^i (i.e., N[u] = N[Bi^f in G'), 
we move u into If Bi contained only u, Fj.{Bj^i) = Fj.{Bj) and FfBj^i) = 
(i.e., N[v] = N[Bj-^-l] in G'), we move v into If u was not moved and 

Bi D {u}, then Bi is split into {u},Bi \ {u} in this order. If v was not moved and 

Bj D {u}, then Bj is split into Bj \ {u}, {v} in this order. The result is a contig of 

C'\{(m,u)}.« 

If the conditions of Lemma 1 3 are fulfilled, one has to update the data structure 
according to its proof. These updates require no knowledge about the connected com- 
ponents of G, and it can be shown that they take 0(1) time. Hence, from Sections 5.2 
and 5.3 we obtain the following result: 

Theorem 14. The decremental proper interval graph representation problem is solvable 
in 0{1) time per removed edge. 

6 Maintaining the Connected Components 

In this section we describe a fully dynamic algorithm for maintaining connectivity in a 
proper interval graph G in 0(log n) time per operation. The algorithm receives as input 
a series of operations to be performed on a graph, which can be any of the following: 
Adding a vertex, adding an edge, deleting a vertex, deleting an edge or querying if 
two blocks are in the same connected component. The algorithm depends on a data 
structure which includes the blocks and the contigs of the graph. It hence interacts with 
the proper interval graph representation algorithm. In response to an update request, 
changes are made to the representation of the graph based on the structure of its connected 
components prior to the update. Only then are the connected components of the graph 
updated. 
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Let us denote by B{G) the block graph of G, that is, a graph in whieh each vertex 
corresponds to a block of G and two vertices are adjacent iff their corresponding blocks 
are adjacent in G. The algorithm maintains a spanning forest F of B{G). In order to 
decide if two blocks are in the same connected component, the algorithm checks if they 
belong to the same tree in F. 

The key idea is to design F so that it can be efficiently updated upon a modification in 
G. We define the edges of F as follows: For every two vertices u and v in B{G), {u, v) G 
E{F) iff their corresponding blocks are consecutive in a contig of G. Consequently, 
each tree in F is a path representing a contig. The crucial observation about F is that 
an addition or a deletion of a vertex or an edge in G induces 0(1) modifications to the 
vertices and edges of F. This can be seen by noting that each modification of G induces 
0(1) updates to near pointers in our representation of G. 

It remains to show how to implement a spanning forest in which trees may be cut 
when an edge is deleted from F, linked when an edge is inserted to F, and which allows 
to query for each vertex to which tree does it belong. All these operations are supported 
by the ET-tree data structure of [8] in 0(log n) time per operation. 

We are now ready to state our main result: 

Theorem 15. The fully dynamic proper interval graph representation problem is solva- 
ble in 0{d + log n) time per modification involving d edges. 

1 The Lower Bound 

In this section we prove a lower bound of 12 (log nj (log log n + log b)) amortized time 
per edge operation for fully dynamic proper interval graph recognition in the cell probe 
model of computation with word-size b [16]. 

Fredman and Saks [6] proved a lower bound of 12 (log n/ (log log n + log b)) amor- 
tized time per operation for the following parity prefix sum (PPS) problem: Given an 
array of integers A[l], . . . , A[n] with initial value zero, execute an arbitrary sequence 
of Add(t) and Sum(t) operations, where an Add(t) increases A[t] by 1, and Sum(t) 
returns (X)i=i fnod 2. Fredman and Henzinger [5] showed that the same lower 
bound applies to the problem of maintaining connectivity in general graphs, by showing 
a reduction from a modified PPS problem, called helpful parity prefix sum, for which 
they proved the same lower bound. A slight change to their reduction yields the same 
lower bound for the problem of maintaining connectivity in proper interval graphs, as 
the graph built in the reduction is a union of two paths and therefore proper interval. 
Using a similar construction we can prove the following result: 

Theorem 16. Fully dynamic proper interval recognition takes J? (log n/ (log log n + 
log 6)) amortized time per edge operation in the cell probe model with word-size b. 
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Abstract. We propose a fast methodology for encoding graphs with information- 
theoretically minimum numbers of bits. The methodology is applicable to general 
classes of graphs; this paper focuses on simple planar graphs. Specifically, a graph 
with property tt is called a -k - graph. If tt satisfies certain properties, then an n-node 
TT-graph G can be encoded by a binary string X such that (I) G and X can be 
obtained from each other in O (n log n) time, and (2) X has at most /?(n) +o(/?(n) ) 
bits for any fiinction /3(n) = I?(n) so that there are at most distinct 

n-node ir-graphs. Examples of such tt include all conjunctions of the following 
sets of properties: (1) G is a planar graph or a plane graph; (2) G is directed or 
undirected; (3) G is triangulated, triconnected, biconnected, merely connected, or 
not required to be connected; and (4) G has at most (respectively, £ 2 ) distinct 
node (respectively, edge) labels. These examples are novel applications of small 
cycle separators of planar graphs and settle several problems that have been open 
since Tutte’s census series were published in 1960’s. 



1 Introduction 

Let G be a graph with n nodes and m edges. This paper studies the problem of encoding 
G into a binary string X with the requirement that X can be decoded to reconstruct G. 
We propose a fast methodology for designing a coding scheme such that the bit count of 
X is information-theoretically optimal. The methodology is applicable to general classes 
of graphs; this paper focuses on simple planar graphs, i.e., planar graphs that are free of 
self-loops and multiple edges. Specifically, a graph with property tt is called a TT-graph. 
If TT satisfies certain properties, then we can obtain an X such that (1) G and X can be 
computed from each other in 0(n log n) time and (2) X has at most j3{n) + o{j3{n)) bits 
for any function l3{n) = fl(n) so that there are at most distinct n-node 

TT-graphs. 

Examples of suitable tt include all conjunctions ofthe following sets of properties: (1) 
G is a planar graph or a plane graph; (2) G is directed or undirected; (3) G is triangulated, 
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triconnected, biconnected, merely connected, or not required to be eonnected; and (4) G 
has at most G (respeetively, £ 2 ) distinet node (respectively, edge) labels. For instance, 
7T can be the property of being a directed unlabeled biconnected plane graph. These 
examples are novel applications of small cycle separators of planar graphs [12,11]. 
They also settle several problems that have been open since Tutte’s eensus series were 
published in 1960’s [18,20,17,19,21]. Tutte proved that there are distinet 

m-edge plane triangulations where (3(m) = (| — log 2 3)m + o(m) « 1.08m + o(m) 
[18] and that there are distinet m-edge n-node trieonnected plane graphs that 

may be non-simple [20] . Note that the rooted trees are the only other nontrivial class of 
graphs with a known polynomial-time information-theoretically optimal coding scheme, 
which encodes a tree as nested parentheses using 2(n — 1) bits. 

Previously, Turan [16] used 4m bits to encode a plane graph G that may have self- 
loops. This bit count was improved by Keeler and Westbrook [10] to 3.58m. They also 
gave coding schemes for several families of plane graphs. In particular, they used 1 . 53m 
bits for a triangulated simple G, and 3m bits for a connected G free of self-loops and 
degree-one nodes. For a simple triangulated G, He, Kao, and Lu [5] improved the bit 
count to |m + 0(1). For a simple G that is triconnected and thus free of degree-one 
nodes, they [5] improved the bit eount to at most 2.835m bits. This bit count was later 
reduced to at most ^ m + 0(1) « 2.378m + 0(1) by Chuang, Garg, He, Kao, and 
Lu [2]. These coding schemes all take linear time for encoding and decoding, but their 
bit counts are not information-theoretically optimal. 

For applications that require support of certain queries, Jacobson [7] gave an O (n) -bit 
encoding for a connected and simple planar graph G that supports traversal in 0(log n) 
time per node visited. Munro and Raman [13] improved this result and gave schemes to 
encode binary trees, rooted ordered trees and planar graphs. For a general planar G, they 
used 2m + 8n + o{m + n) bits while supporting adjaeeney and degree queries in 0(1) 
time. Chuang et al. [2] redueed this bit count to 2m + (5 + ^)n + o(m + n) for any 
constant k > 0 with the same query support. The bit count can be further reduced if only 
0(l)-time adjacency queries are supported, or G is simple, triconnected or triangulated. 

Encoding problems with additional query support and for other classes of graphs 
have also been extensively studied in the literature. For certain graph families, Kannan, 
Naor and Rudich [8] gave schemes that encode eaeh node with 0(log n) bits and support 
0(logn)-time testing of adjacency between two nodes. For dense graphs and comple- 
ment graphs, Kao, Occhiogrosso, and Teng [9] devised two compressed representations 
from adjaeeney lists to speed up basic graph search techniques. Galperin and Wigderson 
[4] and Papadimitriou and Yannakakis [15] investigated complexity issues arising from 
encoding a graph by a small circuit that computes its adjacency matrix. For labeled pla- 
nar graphs, Itai and Rodeh [6] gave an encoding of |n log n + 0{n) bits. For unlabeled 
general graphs, Naor [14] gave an encoding of — n log n + 0{n) bits. 

Section 2 discusses the general encoding methodology. Sections 3 and 4 use the 
methodology to obtain information-theoretically optimal encodings for various classes 
of planar graphs. Section 5 concludes the paper with some future researeh directions. 
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2 The Encoding Methodology 

Let |X| be the number of bits in a binary string X. Let |G| be the number of nodes in a 
graph G. Let IS"! be the number of elements, counting multiplicity, in a multiset S. 

Fact 1 (see [1,3]) Let X\, X 2 , ■ ■ ■ , Xk be 0(1) given binary strings. Let n = |Xi| + 
\X 2 1 + • • • + \Xk \. Then there exists an 0(log n)-bit string x> obtainable in 0{n) time, 
such that given the concatenation of x^ • • • , the index of the first symbol of 

each Xi in the concatenation can be computed in 0(1) time. 

Let X\ + X 2 + • • • + Xk denote the concatenation of x^ , -^ 2 , • • • , Xk as in Fact 1 . 
We call X the auxiliary binary string for X\ + X 2 + • • • + X^. 

A graph with property tt is called a ti - graph. Whether two rr-graphs are equivalent 
or distinct depends on tt. For example, let Gi and G 2 be two distinct embeddings of 
the same planar graph. If tt being a planar graph, then G\ and G 2 are two equivalent tt- 
graphs. If TT being a planar embedding, then Gi and G 2 are two distinct rr-graphs. Let a 
be the number of distinct n-node rr-graphs. Clearly it takes [log 2 a] bits to differentiate 
all n-node rr-graphs. 

Let B{G) be the boundary of the exterior face of a plane graph G. Let M be a 
table, each of whose entries stores a distinct n-node tt -graph. Define indexer (O, M ) to 
be the [log 2 a] -bit index of G in M. Each rr-graph is stored in M in a form such that it 
takes 2*^"° ^ time to determine whether two rr-graphs in this form are equivalent. For 
example, if tt being a directed planar graphs then each n-node rr-graph can be stored 
in M using an adjacency list, in which each node v keeps a list of nodes that have a 
directed edge leaving u. If tt being an undirected plane graph, each n-node rr-graph 
G can be stored in M using an adjacency list of G, B{G), and the counterclockwise 
order of the neighbors of each node in the plane embedding of G. For brevity, we write 
indexer (G, M) as indexer (G) if M is clear from the context. 

Lemma 2. Let ti be a property such that a table M for n-node ir-graphs described as 
above can be obtained in time. Then G and index^^^G, M) can be obtained from 

each other in 2*^”°^ time for any n-node ir-graph G. 

Proof. Straightforward. 

Let Go be an input no-node 7r-graph. Let A = logloglog(no). The encoding algo- 
rithm is merely a function call code,r(Go), where code,r(G') is as follows. 

Algorithm code,r(G) { 

If |G| < A then 
return indexer (G) 
else { 

compute TT-graphs Gi, G 2 , and a string X, from which G can be recovered; 
return code,r(Gi) + code,r(G 2 ) + X\ 

} 

} 
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Clearly, codeTr(Go) can be decoded to recover Gq. 

Algorithm codeTr(G) satisfies the separation property if there exist two constants c 
and r, where 0 < c < 1 and r > 1, such that the following conditions hold: 

PI. max(|Gi|, IG 2 I) < \G\/r- 
P2. |Gi| + |G2|= |G| + 0(|G|'=). 

P3. |X| = 0(|Gn. 

Let /(|G|) be the time required to obtain indeXjr(G) and G from each other. Let 
r/(|G|) be the time required to obtain Gi,G 2 ,X from G, and vice versa. 

Theorem 3. Suppose Algorithm codeTr(G) satisfies the separation property. 

1. If the number of distinct n-node w-graphs is at most 2 l^G-)+oG(n)) some funetion 

(3{n) = ffin), then |code 7 r(G'o)| < P{no) + o{f3{rio)) for any riQ-node w-graph 
Go. 

2. If f{n) = 2^"' ) and g{n) = 0{n), then Go and codeTr(Go) can be obtained 

from each other in 0{no log no) time. 

Proof. Statement 1. Many graphs may appear during the execution of Algorithm 
codeTr(Go)- These graphs can be organized as nodes of a binary tree T rooted at Go, 
where (i) if G\ and G 2 are obtained from G by calling codeTr(G), then Gi and G 2 are 
the children of G in T, and (ii) if |G| < A, then G has no children in T. 

Consider the multiset S consisting of all graphs G that are nodes of T. We partition 
S into f + 1 multisets S'(O), S'(l), 5'(2), . . . , 5(f). 5(0) consists of the graphs G with 
|G| < A. For i > 1 5(i) consists of the graphs G with r*^^A < |G| < r*A. Let 
Go € 5(f). Clearly, f = 0(log ^). 

Define p = l-^l- first show 

|S(i)|<^, (1) 

for every 1 < i < f. Let G be a graph in 5(i). Let 5(0, G) be the set consisting of the 
leaf descendants of G in T; for example, 5(0, Go) = 5(0). By Condition P2, |G| < 
'^HeS(o G) l-^l- Condition PI, no two graphs in S{i) are related in T. Therefore 
5(i) contains at most one ancestor of iT in T for every graph H in 5(0). It follows that 

Eg€S( 0 I<^I ^ EcesO) Erres( 0 ,G) 1^1 ^ P- Since |G| > for every G in 5(i), 
Inequality (1) holds. 

Suppose the children of G in T are Gi and G 2 . Let 6(G) = |X| + \x\, where x is the 
auxiliary binary string for code 7 r(Gi)+code 7 r(G 2 )+Af. Let g = Ei>i Y^G€S{i) KG)- 
Clearly |code^ (Go) I = Effes(o) |code^(iT)| + g < Effes(o)(/^(l-^l) + o(/^(l-^l))) + 
q. Since j3{n) = I7(n), |code 7 r(Go)| < /?(p) + g + o{j3{p)). Therefore Statement 1 can 
be proved by showing that p = no + o(no) and q = o(no). 

By Condition P3, |X| = OdGj'^). By Fact 1, \x\ = 0(log |G|). Thus, 6(G) = 
0(|G|'=),and 

9 = E E (2) 

i>l GeS(i) 

Now we regard the execution of codeTr(Go) as a process of growing T. Let a{T) = 
Eh is a leaf of T | ■ At the beginning of the function call codoTr (Go), T has exactly one 

node Go, and thus a(T) = no. At the end of the function call, T is fully expanded, and 
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thus a(T) = p. By Condition P2, during the execution of code 7 r(Go )5 every function 
call codeTr(G') with |G| > A increases a(T) by 0(|G|‘^). Hence 

p = «o + E E (3) 

i>l Ges(i) 

By Equations (2) and (3), it remains to prove Ecesti) = o(no). Note that 

E E =pA‘=-i(9(l) = o(p). 

i>l GeS(i) i>l i>l 

(4) 

By Equations (3) and (4), p = uq + o{p), and thus p = O(no). Hence 

^ ^ |Gr = o(no). 

i>l GeS(i) 

By Equations (2) and (3), p = no + o(no) and g = o(no), finishing the proof of 
Statement 1. 

Statement 2. Since |G| < r*A and |S'(i)| = O(^) for every 0 < i < f and for 
every G in S{i), Go and code 7 r(Go) can be obtained from each other in time 

yO(/(A)+ ^ r-g(r*A)). 

i<i<e 

Clearly /(A) = = o(logno). Since £ = O(logno) and 

g{n) = 0(n), = O(Alogno), and Statement 2 

follows immediately. 

The next two sections use Theorem 3 to encode various classes of graphs G. Section 3 
considers plane triangulations. Section 4 considers planar graphs and plane graphs. 

3 Plane Triangulation 

In this section, a 7r-graph is a directed or undirected plane triangulation which has at 
most f 1 (respectively, £ 2 ) distinct node (respectively, edge) labels. The number of distinct 
n-node 7r-graphs is [18]. Our encoding scheme is based on the next fact. 

Fact 4 (See [12]) Let G be an n-node plane triangulation. We can compute a cycle G 
of G in 0{n) time such that 

- G has at most nodes; and 

- the numbers of G ’s nodes inside and outside G are at most 2n/3, respectively. 

Let G be a given n-node 7r-graph. Let G be a cycle of G guaranteed by Fact 4. Let 
Gin (respectively. Gout) be the subgraph of G formed by G and the part of G inside 
(respectively, outside) G. Let x be an arbitrary node on G. 

Gi is obtained by placing a cycle G\ of three nodes surrounding Gin and then 
triangulating the interior face of G\ such that a particular node pi of G\ has degree 
strictly higher than the other two. The labels for the nodes and edges of Gi — Gin can 
be assigned arbitrarily. 
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G2 is obtained from Gout by (a) placing a cycle C2 of three nodes surrounding Gout 
and then triangulating the interior face of C2 such that a particular node 1/2 of C2 has 
degree strictly higher than the other two; and (b) triangulating the interior face of G by 
placing a new node z inside of G and then connecting it to each node of G by an edge. 
Similarly, the labels for the nodes and edges of G2 — Gout can be assigned arbitrarily. 

Define dfs(u, G, v) as follows. When dfs(u, G, v) is called, v is always a node on 
B{G). Let w be the counterclockwise neighbor of v on B{G). We perform a depth- 
first search of G starting from v such that ( 1 ) the neighbors of each node are visited 
in the counterclockwise order; and ( 2 ) w is the second visited node. Let dfs(u, G, v) 
be the binary index of u in the above depth- first search. Let X = dfs(a;, Gi,yi) + 
dfs(a;, G2,y2) + dfs(z, G2, 2/2)- 

Lemma 5. 1 . Gi and G2 are ir-graphs. 

2 . There exists a eonstant r > 1 with max(|Gi|, IG2I) < n/r. 

3 . |Gi| + |G2|=n + 0(V^). 

4 . jx| = 0 (logn). 

5 . Gi,G 2,X ean be obtained from G in 0 {n) time. 

6 . G can be obtained from G\,G2,X in 0 {n) time. 

Proof. Statements 1-5 are straightforward by Fact 4 and the definitions of Gi, G2 and 
X. Statement 6 is proved as follows. It takes 0 (n) time to locate y\ (respectively, y2) in 
Gi (respectively, G2) by looking for the node with the highest degree on B{G\) (res- 
pectively, B{G2)). By Fact 1 , it takes 0 ( 1 ) time to obtain dfs(yi, Gi, x), dfs(y2, G2, x), 
and dfs(y2, G2, z) from X. Therefore x and 2: can be located in Gi and G2 in 0 (n) time 
by depth-first traversal. Now Gin can be obtained from Gi by removing B{G\) and its 
incident edges. The cycle G in Gin is simply B (Gin ) ■ Also, Gout can be obtained from G2 
by removing B{G2), z, and their incident edges. The G in Gout is simply the boundary 
of the face that encloses z and its incident edges in G2. Since we know the positions of 
X in Gin and Gout, G can be obtained from Gin and Gout by fitting them together along 
G by aligning x. The overall time complexity is 0 {n). 

Theorem 6. Let Go be an no-node n-graph. Then Gq and codeTr(Go) ean be obtained 
from each other in O(nologno) time. Moreover, |code 7 r(Go)| < /?(no) + o(/?(no)) 
for any function pin) such that the number of distinct n-node it -graphs is at most 

2b{n)+o{p{n)) ^ 

Proof. One can easily see that determining whether two n-node 7 r-graphs are equivalent 
can be done in 2 ^"' ^ time. It is also clear that the class of n-node vr-graphs can be 
enumerated in ^ time. Therefore the table M for n-node 7 r-graphs can be obtained 

in 2 ^”°^ time. By Lemma 2 , indexer (G) and G can be obtained from each other in 
2(|G|°f p time. Thus the theorem follows from Theorem 3 and Lemma 5 . 

4 Planar Graphs and Plane Graphs 

In this section, tt can be any conjunction of the following properties of G: 

- G is a planar graph or a plane graph; 
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Fig.l.Afc -wheel graph Wk ■ 



- G is directed or undirected; 

- G is triconnected, biconnected, connected, or not required to be connected; and 

- G has at most (respectively, ^ 2 ) distinct node (respectively, edge) labels. 

Clearly there are distinct n-node 7r-graphs. 

Let G be an input n-node 7r-graph. For the cases of tt being a planar graph rather than 
a plane graph, let G be embedded first. Note that this is only for the encoding process 
to be able to apply Fact 4. At the base level, we still use the table M for 7r-graphs rather 
than the much larger table M' for embedded 7r-graphs. As shown below, the decoding 
process does not require the vr-graphs to be embedded. 

Let G' be obtained from triangulating G. Let G be a cycle of G' guaranteed by 
Fact 4. Let Gc be the union of G and G. Let Gin (respectively. Gout) be the subgraph 
of Gc formed by G and the part of Gc inside (respectively, outside) G. Let G = 
X 1 X 2 ■ ■ ■ x^x^^i, where = xi. By Fact 4, f = 0{^/n). 

Lemma 7. Let H be an 0{n)-node planar graph. There exists an integer k with ® < 
k < sueh that H does not contain any node of degree k or k — 1. 

Proof. Assume for a contradiction that such a k does not exist. It follows that the sum 
of degrees of all nodes in H is at least (n°'® + — n°'®)/4 = This 

contradicts the fact that H is planar. 

Let Wk, with k > 3x, be a k-wheel graph defined as follows. As shown in Fig. 1, Wk 
consists of /c + 1 nodes wq, w\,W 2 , ■ ■ ■ , Wk-i,Wk, where W\,W 2 , ■ ■ ■ , Wk, wi form a 
cycle. Wo is a degres-k node incident to each node on the cycle. Finally, wi is incident to 
Wk-i- Clearly Wk is triconnected. Also, w\ and Wk are the only degree-four neighbors 
of Wo in Wk- Let ki (respectively, /C 2 ) be an integer k guaranteed by Lemma 7 for Gin 
(respectively. Gout)- Now we define Gi, G 2 and X as follows. 

Gi is obtained from Gin and a -wheel graph Wk^ by adding an edge (wi,Xi) for 
every i = 1, . . . ,£. Clearly for the case of tt being a plane graph, Gi can be embedded 
such that Wki is outside G^, as shown in Fig. 2(a). Thus, the original embedding of Gin 
can be obtained from Gi by removing all nodes of Wk^ ■ The node labels, edge labels, 
and edge directions of Gi — Gin can be assigned arbitrarily. 
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Fig. 2. Gi and G 2 . The gray area of Gi is Gi^. The gray area of G 2 is Gout- 



G2 is obtained from Gout and a /c2 -wheel graph Wk2 by adding an edge {wi,Xi) for 
every i = 1, . . . Clearly for the case of tt being a plane graph, G2 can be embedded 
such that Wk2 is inside C, as shown in Fig. 2 (b). Thus, the original embedding of Gout 
can be obtained from G2 by removing all nodes of Wk2 ■ The node labels, edge labels, 
and edge directions of G2 — Gout can be assigned arbitrarily. 

Let X be an O ( ^/n) -bit string which encodes ki,k2, and whether each edge ) 

is an original edge in G, for i = 1 , . . . , f . 

Lemma 8. 1. G\ and G 2 are n-graphs. 

2. There exists a constant r > 1 with max(|Gi|, IG2I) < n/r. 

3 . |Gi| + IG2I =n + O(n 0 -^). 

4. \X\ = 0{^). 

5. Gi , G 2 , X can be obtained from G in 0{n) time. 

6. G can be obtained from Gi , G 2 , X in 0{n) time. 

Proof. Since Wk^ and Wk2 are both triconnected, and eaeh node of G has degree at 
least three in Gi and G2, Statement 1 holds for each case of the connectivity of the input 
TT-graph G. Statements 2-5 are straightforward by Fact 4 and the definitions of Gi, G2 
and X. Statement 6 is proved as follows. First of all, we obtain k\ from X. Since G^ 
does not eontain any node of degree k\ or k\ — 1 , wq is the only degree-/ci node in Gi. 
Therefore it takes 0 (n) time to identify wq in Gi. Wk^ is the only degree -3 neighbor of 
Wq. Sinee k\ > I, wi is the only degree -5 neighbor of wq. W2 is the eommon neighbor 
of Wq and Wi that is not adjacent to Wk^ . From now on, Wi, for eaeh i = 3 , 4 , . . . , f, 
is the eommon neighbor of wq and Wi-i other than Wi-2- Clearly, wi,W2, ■ ■ ■ ,Wi and 
thus X\,X2, ■ ■ ■ ,Xi can be identified in 0 (n) time. G^ can now be obtained from Gi by 
removing Wk^ . Similarly, Gout can be obtained from G2 and X by deleting Wk^ after 
identifying X\,X2, ■ ■ ■ ,x^. Finally, Gc can be recovered by fitting Gin and Gout together 
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by aligning X\,X 2 , ■ ■ ■ ,X(,. Based on X, G can then be obtained from Gc by removing 
the edges of G that are not originally in G. 

Remark. In the proof for Statement 5 of Lemma 8, identifying the degree- node 
(and the /ci -wheel graph Wk^) does not require the embedding for G\. Therefore the 
decoding process does not require the 7r-graphs to be embedded. This is different from 
the proof of Lemma 5. 

Theorem 9. Let Gq be an no-node n-graph. Then Gq and codeTr(Go) be obtained 
from each other in O(nologno) time. Moreover, |code7r(Go)| < /?(no) + o(/?(no)) 
for any function pin) such that the number of distinct n-node tt - graphs is at most 

2/3(n)+o(/3(n))^ 

Proof. One can easily see that determining whether two n-node 7r-graphs are equivalent 
can be done in ) time. It is also clear that the class of n-node 7r-graphs can be 

enumerated in ^ time. Therefore the table M for n-node 7r-graphs can be obtained 

in 2^'^°'' time. By Lemma 2, indeXjr(G) and G can be obtained from each other in 
2 (|G|°'^ P time. The theorem follows from Theorem 3 and Lemma 8. 

5 Future Directions 

The coding schemes presented in this paper require O(nlogn) time for encoding and 
decoding. An immediate open question is whether one can encode some graphs other 
than rooted trees in 0(n) time using information-theoretically minimum number of bits. 
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